
Shark Tank is a renowned TV show where entrepreneurs pitch their ideas to wealthy investors, known as sharks, aiming to secure investment for their startups. Our analysis focuses on understanding the dynamics of successful pitches on the show by utilizing data from two key sources: the "Shark Tank US Dataset," covering seasons 1 to 14, and the "All Shark Tank Pitches Dataset," spanning eight seasons. These datasets provide valuable insights into the decision-making of investors and pitch strategies employed by the entrepreneurs, allowing us to uncover patterns and trends.
Our investigation explores questions related to the temporal patterns of successful pitches, the co-investment preferences among sharks, and the industries attracting the most capital, factors influencing the shark decisions and the impact of presence of guests on the show. The goal is to look at how the show has progressed over the years and to unveil the strategies behind successful pitches, understand what the preferences of the Shark's are, and explore the impact of various factors on the success of a pitch.
The project really stands out in how we handle the data, making sure every transformation is spot on. This gives us a solid base for digging into various trends. As we shift gears into the analysis, we get into the nitty-gritty of investment patterns, using techniques like TF-IDF and sentiment analysis.
We zoom in on the lifestyle/home sector, going beyond the usual summaries. This gives us a real inside look into what's going on in the industry. We're not just crunching numbers; we're also looking at the impact of guests, adding a human touch that's often missed in these kinds of analyses.
Our approach is hands-on, using techniques like statistical analysis, visualization, and text analysis. This mix justifies putting a big focus on the data analysis part, showing off how deep we're diving.
To sum it up, our project isn't just about numbers and trends; it's about real insights. We're peeling back the layers of data, like when we delved into the Doorbot example, giving us a solid understanding for making smart decisions and pointing to interesting avenues for future research.
Question 1
Can we identify temporal patterns in pitch success on the show and how do they evolve over the seasons, including viewership trends?
Understanding the temporal dynamics of pitch success is crucial for entrepreneurs seeking optimal timing for their Shark Tank appearance. Analyzing seasonal and monthly trends provides insights into the factors influencing success rates and allows entrepreneurs to strategize effectively.
Question 2
Are there statistically significant co-investment patterns among the sharks, revealing insights into their investment strategies and price discrimination?
Exploring co-investment patterns among sharks’ sheds light on collaborative approaches and potential price discrimination. Entrepreneurs can benefit from understanding these dynamics, tailoring pitches to align with preferred investment strategies. This will also be useful to the entrepreneurs in targeting multiple sharks based on their co-investment strategies and portfolios.
Question 3
Can we analyze trends in pitchers on the show to identify sectors with the highest capital raised and assess how investment trends impact the business pitches?
Examining the patterns in capital raised across various sectors offers valuable insights for entrepreneurs seeking to tailor their pitches to current investment trends. Grasping the influence of these trends on business proposals, along with historical data on sector-specific investments by Sharks, proves beneficial for strategic planning.
Question 4
What factors in Business pitches influence the equity demands of sharks on the show and to what extent do these descriptions impact the likelihood of securing a deal?
It is essential for entrepreneurs to carefully consider the various factors that impact equity demands during negotiations. By delving into aspects such as valuation, revenue projections, and refining negotiation skills, entrepreneurs can adopt a more strategic and informed approach to navigate equity discussions. This comprehensive understanding empowers them to make informed decisions, enhancing their ability to achieve favourable outcomes in negotiations and fostering the growth and success of their ventures.
Question 5
Does the presence of specific investors or guests on the show influence entrepreneurs' deal success and viewership, and who has the most significant impact on both?
Analyzing the influence of specific investors or guests is crucial to entrepreneurs for tailoring pitches. This information could be valuable for the producers of Shark Tank too, as it indicates which guests have a more substantial and consistent influence on the show's success in terms of attracting and retaining viewers.
We will be making use of two datasets. The first dataset consolidates data from seasons 1 to 14 of the American business reality series, Shark Tank. It consists of 1274 rows and 50 columns with each row representing a different pitch made on Shark Tank. These fields present diverse details regarding the pitch, entrepreneurs, and deals formulated within the episodes.
The second dataset provides a comprehensive record of pitches made on Shark Tank. It contains 5 columns and 706 rows with each row representing a pitch. This dataset spans across eight seasons, offering valuable insights into the factors that influence the success of these pitches and the subsequent investment decisions by the sharks. This dataset contains the following headers:
This dataset complements our previous dataset, further enhancing our ability to analyze and understand the dynamics of pitches on "Shark Tank." With this additional data, we can explore how specific sharks' interests and the business descriptions influence investment decisions, helping us build a more comprehensive picture of the show's outcomes.
Dataset 1 - https://www.kaggle.com/datasets/thirumani/shark-tank-us-dataset/data
Dataset 2 - https://www.kaggle.com/datasets/neiljs/all-shark-tank-us-pitches-deals
Both datasets are available on Kaggle and are linked above. We will download the datasets from Kaggle and import them into our Jupyter Notebook environment for analysis.
Environment Setup:
import pandas as pd
import numpy as np
import re
from datetime import datetime
import matplotlib.pyplot as plt
import imageio #reading and writing image data
import seaborn as sns
import itertools
from PIL import Image #open and save images
import plotly.graph_objects as go #make interactive graphs
import plotly.express as px
from plotly.subplots import make_subplots #display multiple plots in one figure
import plotly.figure_factory as ff
#To corretly show the plotly graph in html
import plotly.io as pio
#pio.renderers.default = 'notebook'
import nltk #used to analyze data written by humans
from nltk.stem import WordNetLemmatizer #make sensible words out of uncleaned data
from nltk.corpus import stopwords #filter out common words
from sklearn.feature_extraction.text import TfidfVectorizer #convert raw documents to a TF-IDF matrix
from wordcloud import WordCloud #generate a word cloud
from textblob import TextBlob #processing textual data
import textstat #calculating textual statistics including readability scores
nltk.download('stopwords') #one time installation
nltk.download('wordnet') #one time installation
#import libraries needed for getting images from web
from PIL import Image
import requests
from io import BytesIO
[nltk_data] Downloading package stopwords to [nltk_data] /Users/siddharthkulkarni/nltk_data... [nltk_data] Package stopwords is already up-to-date! [nltk_data] Downloading package wordnet to [nltk_data] /Users/siddharthkulkarni/nltk_data... [nltk_data] Package wordnet is already up-to-date!
print("Notebook last executed on:", datetime.now().strftime("%m/%d/%Y, %H:%M:%S"))
Notebook last executed on: 12/06/2023, 19:52:00
# import data from csv file present in GitHub repository using pandas
path = 'https://raw.githubusercontent.com/JoyceGaoH/project-shark/main/Shark%20Tank%20US%20dataset.csv' # save github repository url
df_shark_tank_1 = pd.read_csv(path)
df_shark_tank_1.head()
| Season Number | Season Start | Season End | Episode Number | Pitch Number | Original Air Date | Startup Name | Industry | Business Description | Pitchers Gender | ... | Kevin O Leary Investment Equity | Guest Investment Amount | Guest Investment Equity | Guest Name | Barbara Corcoran Present | Mark Cuban Present | Lori Greiner Present | Robert Herjavec Present | Daymond John Present | Kevin O Leary Present | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 9-Aug-09 | 5-Feb-10 | 1 | 1 | 9-Aug-09 | AvaTheElephant | Health/Wellness | Ava The Elephant - Baby and Child Care | Female | ... | NaN | NaN | NaN | NaN | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 1 | 1 | 9-Aug-09 | 5-Feb-10 | 1 | 2 | 9-Aug-09 | Mr.Tod'sPieFactory | Food and Beverage | Mr. Tod's Pie Factory - Specialty Food | Male | ... | NaN | NaN | NaN | NaN | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 2 | 1 | 9-Aug-09 | 5-Feb-10 | 1 | 3 | 9-Aug-09 | Wispots | Business Services | Wispots - Consumer Services | Male | ... | NaN | NaN | NaN | NaN | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 3 | 1 | 9-Aug-09 | 5-Feb-10 | 1 | 4 | 9-Aug-09 | CollegeFoxesPackingBoxes | Lifestyle/Home | College Foxes Packing Boxes - Consumer Services | Male | ... | NaN | NaN | NaN | NaN | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 4 | 1 | 9-Aug-09 | 5-Feb-10 | 1 | 5 | 9-Aug-09 | IonicEar | Software/Tech | Ionic Ear - Novelties | Male | ... | NaN | NaN | NaN | NaN | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
5 rows × 50 columns
path2 = 'https://raw.githubusercontent.com/JoyceGaoH/project-shark/main/Sharktankpitchesdeals.csv'
df_shark_tank_2 = pd.read_csv(path2)
df_shark_tank_2.head()
| Season_Epi_code | Pitched_Business_Identifier | Pitched_Business_Desc | Deal_Status | Deal_Shark | |
|---|---|---|---|---|---|
| 0 | 826 | Bridal Buddy | a functional slip worn under a wedding gown th... | 1 | KOL+LG |
| 1 | 826 | Laid Brand | hair-care products made with pheromones . Laid... | 0 | NaN |
| 2 | 826 | Rocketbook | a notebook that can scan contents to cloud ser... | 0 | NaN |
| 3 | 826 | Wine & Design | painting classes with wine served . Wine & Des... | 1 | KOL |
| 4 | 824 | Peoples Design | a mixing bowl with a built-in scoop . Peoples ... | 1 | LG |
df_shark_tank_1.shape
(1274, 50)
df_shark_tank_2.shape
(706, 5)
The second datasets seems to have lesser number of observations as comapred to the first one. We observed that the 2nd dataset has records for 8 seasons, whereas, the 1st datasets has records for 14.
df_shark_tank_1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1274 entries, 0 to 1273 Data columns (total 50 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Season Number 1274 non-null int64 1 Season Start 1274 non-null object 2 Season End 1274 non-null object 3 Episode Number 1274 non-null int64 4 Pitch Number 1274 non-null int64 5 Original Air Date 1274 non-null object 6 Startup Name 1274 non-null object 7 Industry 1274 non-null object 8 Business Description 1274 non-null object 9 Pitchers Gender 1267 non-null object 10 Pitchers City 502 non-null object 11 Pitchers State 746 non-null object 12 Pitchers Average Age 338 non-null object 13 Entrepreneur Names 779 non-null object 14 Company Website 516 non-null object 15 Multiple Entrepreneurs 847 non-null float64 16 US Viewership 1274 non-null float64 17 Original Ask Amount 1274 non-null int64 18 Original Offered Equity 1274 non-null float64 19 Valuation Requested 1274 non-null int64 20 Got Deal 1274 non-null int64 21 Total Deal Amount 765 non-null float64 22 Total Deal Equity 765 non-null float64 23 Deal Valuation 765 non-null float64 24 Number of sharks in deal 765 non-null float64 25 Investment Amount Per Shark 765 non-null float64 26 Equity Per Shark 765 non-null float64 27 Royalty Deal 75 non-null float64 28 Loan 52 non-null float64 29 Barbara Corcoran Investment Amount 120 non-null float64 30 Barbara Corcoran Investment Equity 120 non-null float64 31 Mark Cuban Investment Amount 230 non-null float64 32 Mark Cuban Investment Equity 230 non-null float64 33 Lori Greiner Investment Amount 199 non-null float64 34 Lori Greiner Investment Equity 199 non-null float64 35 Robert Herjavec Investment Amount 121 non-null float64 36 Robert Herjavec Investment Equity 121 non-null float64 37 Daymond John Investment Amount 111 non-null float64 38 Daymond John Investment Equity 111 non-null float64 39 Kevin O Leary Investment Amount 117 non-null float64 40 Kevin O Leary Investment Equity 117 non-null float64 41 Guest Investment Amount 105 non-null float64 42 Guest Investment Equity 105 non-null float64 43 Guest Name 105 non-null object 44 Barbara Corcoran Present 898 non-null float64 45 Mark Cuban Present 901 non-null float64 46 Lori Greiner Present 901 non-null float64 47 Robert Herjavec Present 897 non-null float64 48 Daymond John Present 898 non-null float64 49 Kevin O Leary Present 898 non-null float64 dtypes: float64(31), int64(6), object(13) memory usage: 497.8+ KB
A deep-dive into the dataset's columns:
The journey begins by navigating through different seasons, with each season having their own sets of challenges, successes and new learnings. We discover how the atmosphere of entrepreneurial initiatives evolves over time.
The world of entrepreneurship is full of diverse personalities, each with an unique history. 'Pitchers Gender,' 'Pitchers City,' and 'Pitchers State' depict a clear image of the entrepreneurs, highlighting the gender and geographic diversity of people looking to create an impact.
Pitches that are successful emphasize strategic financial moves and the skillful negotiations to get investments by the enterpreneurs.
Shark engagement, visible by their presence, highlights the strategic partnerships that investors and entrepreneurs build, which have an impact on business prospects.
The results of the show gives insights about the business profiles as well as the difficulties entrepreneurs face when seeking investment, providing information about the difficulties faced by businesses in the real world.
df_shark_tank_2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 706 entries, 0 to 705 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Season_Epi_code 706 non-null int64 1 Pitched_Business_Identifier 706 non-null object 2 Pitched_Business_Desc 706 non-null object 3 Deal_Status 706 non-null int64 4 Deal_Shark 383 non-null object dtypes: int64(2), object(3) memory usage: 27.7+ KB
A brief look into the 2nd dataset's columns:
df_shark_tank_1.describe().applymap('{:,.2f}'.format)
| Season Number | Episode Number | Pitch Number | Multiple Entrepreneurs | US Viewership | Original Ask Amount | Original Offered Equity | Valuation Requested | Got Deal | Total Deal Amount | ... | Kevin O Leary Investment Amount | Kevin O Leary Investment Equity | Guest Investment Amount | Guest Investment Equity | Barbara Corcoran Present | Mark Cuban Present | Lori Greiner Present | Robert Herjavec Present | Daymond John Present | Kevin O Leary Present | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1,274.00 | 1,274.00 | 1,274.00 | 847.00 | 1,274.00 | 1,274.00 | 1,274.00 | 1,274.00 | 1,274.00 | 765.00 | ... | 117.00 | 117.00 | 105.00 | 105.00 | 898.00 | 901.00 | 901.00 | 897.00 | 898.00 | 898.00 |
| mean | 7.92 | 12.52 | 637.50 | 0.44 | 5.14 | 284,137.36 | 13.80 | 3,550,595.48 | 0.60 | 296,062.96 | ... | 240,747.86 | 15.11 | 212,293.65 | 15.59 | 0.56 | 0.90 | 0.75 | 0.88 | 0.66 | 0.96 |
| std | 3.72 | 7.47 | 367.92 | 0.50 | 1.48 | 359,005.10 | 8.64 | 5,878,462.11 | 0.49 | 358,828.25 | ... | 300,652.14 | 11.23 | 211,753.55 | 13.35 | 0.50 | 0.30 | 0.43 | 0.33 | 0.47 | 0.21 |
| min | 1.00 | 1.00 | 1.00 | 0.00 | 2.27 | 10,000.00 | 1.00 | 40,000.00 | 0.00 | 10,000.00 | ... | 20,000.00 | 0.00 | 20,000.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 25% | 5.00 | 6.00 | 319.25 | 0.00 | 3.85 | 100,000.00 | 10.00 | 666,667.00 | 0.00 | 100,000.00 | ... | 83,333.33 | 6.00 | 75,000.00 | 8.75 | 0.00 | 1.00 | 1.00 | 1.00 | 0.00 | 1.00 |
| 50% | 8.00 | 12.00 | 637.50 | 0.00 | 4.88 | 200,000.00 | 10.00 | 1,500,000.00 | 1.00 | 200,000.00 | ... | 150,000.00 | 10.00 | 125,000.00 | 11.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 75% | 11.00 | 19.00 | 955.75 | 1.00 | 6.39 | 350,000.00 | 20.00 | 4,000,000.00 | 1.00 | 350,000.00 | ... | 270,000.00 | 20.00 | 250,000.00 | 20.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| max | 14.00 | 29.00 | 1,274.00 | 1.00 | 8.64 | 5,000,000.00 | 100.00 | 100,000,000.00 | 1.00 | 5,000,000.00 | ... | 2,500,000.00 | 50.00 | 1,250,000.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
8 rows × 37 columns
This indicates high expectations or confidence in their businesses, as they are likely in early stages of development.
The relatively high valuation requests might also reflect the entrepreneurs' understanding of negotiation dynamics on the show, where sharks often counter-offer with lower valuations.
The fact that the average deal amount is higher than the median ask could imply that sharks are willing to invest more in ventures they see as highly promising. It might also suggest that entrepreneurs who ask for reasonable or slightly lower amounts are more likely to get a deal, possibly with better terms.
The average viewership of about 5.14 million U.S. viewers per episode highlights the show's popularity and wide appeal. This can be attributed to the educational and entertainment value it provides, offering insights into entrepreneurship, investment negotiations, and business strategies.
High viewership also means greater exposure for the businesses that pitch, which can be valuable in itself, independent of whether they secure a deal.
df_shark_tank_2.describe().applymap('{:,.2f}'.format)
| Season_Epi_code | Deal_Status | |
|---|---|---|
| count | 706.00 | 706.00 |
| mean | 519.65 | 0.54 |
| std | 213.60 | 0.50 |
| min | 101.00 | 0.00 |
| 25% | 405.00 | 0.00 |
| 50% | 523.00 | 1.00 |
| 75% | 709.00 | 1.00 |
| max | 826.00 | 1.00 |
Upon having a look at the .info() and .describe() function outputs for majority of the columns present in both datasets, which cover a lot of statistical information, we infer that these results are not that useful and not much can be derived out of them.
Therefore, a deeper analysis is required with the help of visualizations.
# Frequency Counts for Categorical Variables
df_shark_tank_1['Industry'].value_counts()
Food and Beverage 276 Lifestyle/Home 228 Fashion/Beauty 217 Children/Education 118 Fitness/Sports/Outdoors 113 Health/Wellness 65 Software/Tech 65 Pet Products 51 Business Services 37 Media/Entertainment 24 Uncertain/Other 18 Automotive 17 Electronics 15 Green/CleanTech 11 Travel 11 Liquor/Alcohol 8 Name: Industry, dtype: int64
Business proposals from Shark Tank are categorized into several categories, exposing patterns in investor interest and entrepreneurial focus. Food and Beverage is the most popular category, with 276 pitches, suggesting a high likeliness towards entrepreneurship in this industry, perhaps because of its relevance to consumers and wide market appeal. Lifestyle/Home and Fashion/Beauty trail closely after, indicating a notable inclination towards consumer products and services that improve everyday life and individual appearance.
A noteworthy presence of categories such as Children/Education and Fitness/Sports/Outdoors is indicative of the cultural emphasis on wellness, health, and education. Software/Tech and Health/Wellness are interestingly intertwined, which may suggest a balanced interest in both health-related and technical progress. The lower volume of pitches in specialized categories such as Travel, Liquor/Alcohol, and Green/CleanTech may be due to a variety of factors such as market size, perceptions of risk and investor expertise.
# Display the number of pitchers by gender and teams
gender_teams = df_shark_tank_1['Pitchers Gender'].value_counts()
print(gender_teams)
# create a figure and set different background
Male 703 Female 330 Mixed Team 234 Name: Pitchers Gender, dtype: int64
The Shark Tank show's gender distribution data shows distinct patterns in the representation of entrepreneurs: there are 234 pitches from mixed-gender teams and 703 male pitchers, a substantial majority over the 330 female pitchers. This draws attention to the gender gap in the entrepreneurial field, pointing to a higher participation rate among men and possible obstacles for female entrepreneurs using these platforms. Mixed teams demonstrate cross-gender cooperation. The gender dynamics and biases that are prevalent in the investment and entrepreneurship fields are highlighted by this data.
Null Value Analysis:
The isnull().sum() method is used to identify the total number of null values in each column of the datasets df_shark_tank_1 and df_shark_tank_2.
Data Type Conversion: Several columns in df_shark_tank_1 are converted to appropriate data types:
Investment-related columns to float.
Date columns to datetime.
Season, episode, and pitch numbers to integers.
Startup name, industry, and business description to string.
Multiple Entrepreuners column is converted to integer, potentially indicating a binary or categorical nature.
Handling Missing Data:
Rows with null values in 'Pitchers Gender' are dropped.
Certain columns are dropped (e.g., 'Royalty Deal', 'Loan') due to irrelevance to the analysis.
Columns expected to contain textual information (e.g., 'Pitchers City', 'Entrepreneur Names') are filled with 'Unknown' when null.
Numeric columns are filled with 0.0 when null, indicating either a lack of investment or non-applicability of the metric.
Dataset Indexing, Modification and Merging:
The set_index() method is used to set 'Startup Name' as the index for df_st1 (modified df_shark_tank_1).
A subset of df_shark_tank_2 is created focusing on business identifiers and descriptions.
Company names are standardized by converting to lowercase and removing whitespace to facilitate merging.
A left merge is performed between df_st1 and the modified df_shark_tank_2 (df_shark_tank_3), ensuring the preservation of df_shark_tank_1's data.
Business descriptions from both datasets are combined, and the index is set to the standardized company name.
Thorough data processing and cleaning phase sets a strong foundation for subsequent data analysis, ensuring that the analysis is conducted on reliable and well-structured data.
df_shark_tank_1.isnull().sum() # show the total number of null values per column
Season Number 0 Season Start 0 Season End 0 Episode Number 0 Pitch Number 0 Original Air Date 0 Startup Name 0 Industry 0 Business Description 0 Pitchers Gender 7 Pitchers City 772 Pitchers State 528 Pitchers Average Age 936 Entrepreneur Names 495 Company Website 758 Multiple Entrepreneurs 427 US Viewership 0 Original Ask Amount 0 Original Offered Equity 0 Valuation Requested 0 Got Deal 0 Total Deal Amount 509 Total Deal Equity 509 Deal Valuation 509 Number of sharks in deal 509 Investment Amount Per Shark 509 Equity Per Shark 509 Royalty Deal 1199 Loan 1222 Barbara Corcoran Investment Amount 1154 Barbara Corcoran Investment Equity 1154 Mark Cuban Investment Amount 1044 Mark Cuban Investment Equity 1044 Lori Greiner Investment Amount 1075 Lori Greiner Investment Equity 1075 Robert Herjavec Investment Amount 1153 Robert Herjavec Investment Equity 1153 Daymond John Investment Amount 1163 Daymond John Investment Equity 1163 Kevin O Leary Investment Amount 1157 Kevin O Leary Investment Equity 1157 Guest Investment Amount 1169 Guest Investment Equity 1169 Guest Name 1169 Barbara Corcoran Present 376 Mark Cuban Present 373 Lori Greiner Present 373 Robert Herjavec Present 377 Daymond John Present 376 Kevin O Leary Present 376 dtype: int64
# Change columns to float type
df_shark_tank_1['Guest Investment Amount'] = df_shark_tank_1['Guest Investment Amount'].astype(float)
df_shark_tank_1['Guest Investment Equity'] = df_shark_tank_1['Guest Investment Equity'].astype(float)
# Change columns to datetime type
df_shark_tank_1["Season Start"]=pd.to_datetime(df_shark_tank_1["Season Start"])
df_shark_tank_1["Season End"]=pd.to_datetime(df_shark_tank_1["Season End"])
df_shark_tank_1["Original Air Date"]=pd.to_datetime(df_shark_tank_1["Original Air Date"])
# Change columns to integer type
df_shark_tank_1['Season Number'] = df_shark_tank_1['Season Number'].astype(pd.Int32Dtype())
df_shark_tank_1['Episode Number'] = df_shark_tank_1['Episode Number'].astype(pd.Int32Dtype())
df_shark_tank_1['Pitch Number'] = df_shark_tank_1['Pitch Number'].astype(pd.Int32Dtype())
# Change columns to string type
df_shark_tank_1['Startup Name'] = df_shark_tank_1['Startup Name'].astype(str)
df_shark_tank_1['Industry'] = df_shark_tank_1['Industry'].astype(str)
df_shark_tank_1['Business Description'] = df_shark_tank_1['Business Description'].astype(str)
df_shark_tank_1['Multiple Entrepreneurs'] = df_shark_tank_1['Multiple Entrepreneurs'].astype(pd.Int32Dtype()) # integer type
# since this column has only 7 rows with null values, it makes sense to drop those obsevations
df_shark_tank_1.dropna(subset=['Pitchers Gender'],inplace=True)
# dropping the columns 'Royalty Deal' and 'Loan' since they are not relevant to our analysis
df_shark_tank_1.drop(['Royalty Deal','Loan'], axis=1, inplace=True)
# filling columns with unknown which have null values
columns_to_fill_unknown=['Pitchers City', 'Pitchers State', 'Pitchers Average Age','Entrepreneur Names', 'Guest Name', 'Company Website']
# filling columns with value as 0.0 which have null values
columns_to_fill_0=['Multiple Entrepreneurs', 'Total Deal Amount', 'Total Deal Equity',
'Deal Valuation', 'Number of sharks in deal', 'Investment Amount Per Shark', 'Equity Per Shark',
'Barbara Corcoran Investment Equity',
'Mark Cuban Investment Equity', 'Lori Greiner Investment Equity',
'Robert Herjavec Investment Equity',
'Daymond John Investment Equity', 'Kevin O Leary Investment Equity',
'Guest Investment Equity', 'Barbara Corcoran Present', 'Mark Cuban Present',
'Lori Greiner Present', 'Robert Herjavec Present', 'Daymond John Present', 'Kevin O Leary Present']
The following variables shall not be filled with nulls eventhough they have multiple null values in them since it aids in easier analysis later
# filling null values
df_st1=df_shark_tank_1.apply(lambda x: x.fillna(0.0) if x.name in columns_to_fill_0 else x.fillna('Unknown') if x.name in columns_to_fill_unknown else x)
df_st1.head()
| Season Number | Season Start | Season End | Episode Number | Pitch Number | Original Air Date | Startup Name | Industry | Business Description | Pitchers Gender | ... | Kevin O Leary Investment Equity | Guest Investment Amount | Guest Investment Equity | Guest Name | Barbara Corcoran Present | Mark Cuban Present | Lori Greiner Present | Robert Herjavec Present | Daymond John Present | Kevin O Leary Present | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2009-08-09 | 2010-02-05 | 1 | 1 | 2009-08-09 | AvaTheElephant | Health/Wellness | Ava The Elephant - Baby and Child Care | Female | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 1 | 1 | 2009-08-09 | 2010-02-05 | 1 | 2 | 2009-08-09 | Mr.Tod'sPieFactory | Food and Beverage | Mr. Tod's Pie Factory - Specialty Food | Male | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 2 | 1 | 2009-08-09 | 2010-02-05 | 1 | 3 | 2009-08-09 | Wispots | Business Services | Wispots - Consumer Services | Male | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 3 | 1 | 2009-08-09 | 2010-02-05 | 1 | 4 | 2009-08-09 | CollegeFoxesPackingBoxes | Lifestyle/Home | College Foxes Packing Boxes - Consumer Services | Male | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 4 | 1 | 2009-08-09 | 2010-02-05 | 1 | 5 | 2009-08-09 | IonicEar | Software/Tech | Ionic Ear - Novelties | Male | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
5 rows × 48 columns
df_shark_tank_2.isnull().sum() # show the total number of null values per column
Season_Epi_code 0 Pitched_Business_Identifier 0 Pitched_Business_Desc 0 Deal_Status 0 Deal_Shark 323 dtype: int64
df_shark_tank_2['Deal_Shark'].fillna('No Deal Made', inplace=True)
Since each business/startup has a unique and creative name, we set it as the index of the dataframe.
# Setting the index to Startup Name
df_st1.set_index(['Startup Name'], inplace=True)
df_st1.head()
| Season Number | Season Start | Season End | Episode Number | Pitch Number | Original Air Date | Industry | Business Description | Pitchers Gender | Pitchers City | ... | Kevin O Leary Investment Equity | Guest Investment Amount | Guest Investment Equity | Guest Name | Barbara Corcoran Present | Mark Cuban Present | Lori Greiner Present | Robert Herjavec Present | Daymond John Present | Kevin O Leary Present | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Startup Name | |||||||||||||||||||||
| AvaTheElephant | 1 | 2009-08-09 | 2010-02-05 | 1 | 1 | 2009-08-09 | Health/Wellness | Ava The Elephant - Baby and Child Care | Female | Atlanta | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| Mr.Tod'sPieFactory | 1 | 2009-08-09 | 2010-02-05 | 1 | 2 | 2009-08-09 | Food and Beverage | Mr. Tod's Pie Factory - Specialty Food | Male | Somerset | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| Wispots | 1 | 2009-08-09 | 2010-02-05 | 1 | 3 | 2009-08-09 | Business Services | Wispots - Consumer Services | Male | Cary | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| CollegeFoxesPackingBoxes | 1 | 2009-08-09 | 2010-02-05 | 1 | 4 | 2009-08-09 | Lifestyle/Home | College Foxes Packing Boxes - Consumer Services | Male | Tampa | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| IonicEar | 1 | 2009-08-09 | 2010-02-05 | 1 | 5 | 2009-08-09 | Software/Tech | Ionic Ear - Novelties | Male | St. Paul | ... | 0.0 | NaN | 0.0 | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
5 rows × 47 columns
# Create a copy of dataset 2 and extract the text columns
df_shark_tank_3 = df_shark_tank_2.loc[:,('Pitched_Business_Identifier','Pitched_Business_Desc')]
# Convert all the company name to lower case to match with dataset 1
df_shark_tank_3['Pitched_Business_Identifier_m'] = df_shark_tank_2['Pitched_Business_Identifier'].str.lower()
# Remove all the white space in company names
df_shark_tank_3['Name'] = df_shark_tank_3['Pitched_Business_Identifier_m'].str.replace('\s','',regex=True)
df_shark_tank_3.head()
| Pitched_Business_Identifier | Pitched_Business_Desc | Pitched_Business_Identifier_m | Name | |
|---|---|---|---|---|
| 0 | Bridal Buddy | a functional slip worn under a wedding gown th... | bridal buddy | bridalbuddy |
| 1 | Laid Brand | hair-care products made with pheromones . Laid... | laid brand | laidbrand |
| 2 | Rocketbook | a notebook that can scan contents to cloud ser... | rocketbook | rocketbook |
| 3 | Wine & Design | painting classes with wine served . Wine & Des... | wine & design | wine&design |
| 4 | Peoples Design | a mixing bowl with a built-in scoop . Peoples ... | peoples design | peoplesdesign |
Convert the company name in both dataset to lower case as the cloumn to join on.
# Convert all the company name to lower case to match with dataset 2
df_st1['Name'] = df_st1.index.str.lower()
# Perform left merge on the two dataset, preserve the data in dataset 1
df_shark_tank_merged = df_st1.merge(df_shark_tank_3,how='left',on='Name')
df_shark_tank_merged.head()
| Season Number | Season Start | Season End | Episode Number | Pitch Number | Original Air Date | Industry | Business Description | Pitchers Gender | Pitchers City | ... | Barbara Corcoran Present | Mark Cuban Present | Lori Greiner Present | Robert Herjavec Present | Daymond John Present | Kevin O Leary Present | Name | Pitched_Business_Identifier | Pitched_Business_Desc | Pitched_Business_Identifier_m | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2009-08-09 | 2010-02-05 | 1 | 1 | 2009-08-09 | Health/Wellness | Ava The Elephant - Baby and Child Care | Female | Atlanta | ... | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | avatheelephant | Ava the Elephant | (Emmy the Elephant during show, trademarked a... | ava the elephant |
| 1 | 1 | 2009-08-09 | 2010-02-05 | 1 | 2 | 2009-08-09 | Food and Beverage | Mr. Tod's Pie Factory - Specialty Food | Male | Somerset | ... | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | mr.tod'spiefactory | Mr. Tod's Pie Factory | a pie company | mr. tod's pie factory |
| 2 | 1 | 2009-08-09 | 2010-02-05 | 1 | 3 | 2009-08-09 | Business Services | Wispots - Consumer Services | Male | Cary | ... | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | wispots | Wispots | an electronic hand-held device for waiting roo... | wispots |
| 3 | 1 | 2009-08-09 | 2010-02-05 | 1 | 4 | 2009-08-09 | Lifestyle/Home | College Foxes Packing Boxes - Consumer Services | Male | Tampa | ... | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | collegefoxespackingboxes | College Foxes Packing Boxes | a packing and organizing service based on an a... | college foxes packing boxes |
| 4 | 1 | 2009-08-09 | 2010-02-05 | 1 | 5 | 2009-08-09 | Software/Tech | Ionic Ear - Novelties | Male | St. Paul | ... | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ionicear | Ionic Ear | an implantable Bluetooth device requiring surg... | ionic ear |
5 rows × 51 columns
Since the longer pithes discription in the second dataset only contains Business Description information from season 1-8, we choose to combine the short Business Description from dataset one together with the longer description to one column.
# Merge the long description with the shorter discription into one column
df_shark_tank_merged['Business Description']=df_shark_tank_merged['Business Description'].fillna('').map(str)+'-'+df_shark_tank_merged['Pitched_Business_Desc'].fillna('').map(str)
df_shark_tank_merged['Name'] = df_shark_tank_merged['Pitched_Business_Identifier'].str.replace('\s','',regex=True)
#Setting the index of the dataframe as Name
df_shark_tank_merged.set_index(['Name'],inplace=True)
df_shark_tank_merged.head()
| Season Number | Season Start | Season End | Episode Number | Pitch Number | Original Air Date | Industry | Business Description | Pitchers Gender | Pitchers City | ... | Guest Name | Barbara Corcoran Present | Mark Cuban Present | Lori Greiner Present | Robert Herjavec Present | Daymond John Present | Kevin O Leary Present | Pitched_Business_Identifier | Pitched_Business_Desc | Pitched_Business_Identifier_m | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Name | |||||||||||||||||||||
| AvatheElephant | 1 | 2009-08-09 | 2010-02-05 | 1 | 1 | 2009-08-09 | Health/Wellness | Ava The Elephant - Baby and Child Care- (Emmy ... | Female | Atlanta | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Ava the Elephant | (Emmy the Elephant during show, trademarked a... | ava the elephant |
| Mr.Tod'sPieFactory | 1 | 2009-08-09 | 2010-02-05 | 1 | 2 | 2009-08-09 | Food and Beverage | Mr. Tod's Pie Factory - Specialty Food-a pie c... | Male | Somerset | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Mr. Tod's Pie Factory | a pie company | mr. tod's pie factory |
| Wispots | 1 | 2009-08-09 | 2010-02-05 | 1 | 3 | 2009-08-09 | Business Services | Wispots - Consumer Services-an electronic hand... | Male | Cary | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Wispots | an electronic hand-held device for waiting roo... | wispots |
| CollegeFoxesPackingBoxes | 1 | 2009-08-09 | 2010-02-05 | 1 | 4 | 2009-08-09 | Lifestyle/Home | College Foxes Packing Boxes - Consumer Service... | Male | Tampa | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | College Foxes Packing Boxes | a packing and organizing service based on an a... | college foxes packing boxes |
| IonicEar | 1 | 2009-08-09 | 2010-02-05 | 1 | 5 | 2009-08-09 | Software/Tech | Ionic Ear - Novelties-an implantable Bluetooth... | Male | St. Paul | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Ionic Ear | an implantable Bluetooth device requiring surg... | ionic ear |
5 rows × 50 columns
This analysis employs a range of techniques including data cleaning and preprocessing, statistical analysis, visualization, text analysis (including NLP techniques like sentiment analysis and TF-IDF), all to derive insights from the Shark Tank dataset. The analysis provides a multifaceted view of the show's dynamics, from investment patterns and success rates to the textual analysis of pitches and the impact of guest appearances.
Can we identify temporal patterns in pitch success on the show and how do they evolve over the seasons, including viewership trends?
# Converting 'Original Air Date' to datetime format and extracting month and year
df_shark_tank_1['Original Air Date'] = pd.to_datetime(df_shark_tank_1['Original Air Date'])
df_shark_tank_1['Month'] = df_shark_tank_1['Original Air Date'].dt.month
df_shark_tank_1['Year'] = df_shark_tank_1['Original Air Date'].dt.year
# Finding if a deal has been secured
df_shark_tank_1['Success'] = df_shark_tank_1['Got Deal'] == 1
# Creating a variable num_to_percentage that will be used to convert the rate into percentage
num_to_percentage = 100
# Group by month and calculate success rate
monthly_success_rate = df_shark_tank_1.groupby('Month')['Success'].mean()*num_to_percentage
# Group by season and calculate success rate
seasonal_success_rate = df_shark_tank_1.groupby('Season Number')['Success'].mean()*num_to_percentage
# Create subplots with 3 vertical line charts
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(8,10))
# Plotting the success rate by month
ax1.plot(monthly_success_rate.index, monthly_success_rate.values)
ax1.set_title('Success Rate by Month')
ax1.set_xlabel('Month')
ax1.set_ylabel('Success Rate (in %)')
ax1.set_xticks(range(1, 13))
ax1.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
# Plotting the success rate by season
ax2.plot(seasonal_success_rate.index, seasonal_success_rate.values)
ax2.set_title('Success Rate by Season')
ax2.set_xlabel('Season Number')
ax2.set_xticks(range(0, 15))
ax2.set_xticklabels(['0','1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14'])
ax2.set_ylabel('Success Rate (in %)')
# Grouping by year and calculate average viewership
yearly_viewership = df_shark_tank_1.groupby('Year')['US Viewership'].mean()
# Plotting the viewership trend over the years
ax3.plot(yearly_viewership.index, yearly_viewership.values)
ax3.set_title('Average Yearly Viewership Trends')
ax3.set_xlabel('Year')
ax3.set_ylabel('Average Viewership (in Millions)')
# Making a few adjustments to ensure all line charts are evenly spaced
plt.tight_layout()
plt.subplots_adjust(hspace=0.4)
plt.show()
Figure 10.1: Line Graph based distribution for the average viewership trends over the years, and their corresponding success rates.
The Success Rate by Month shows a marked increase in December, aligning with the end of the fiscal year, when investors are inclined to utilize remaining budgets. It's also a time of heightened consumer activity due to the holiday season, which may positively influence investment decisions. Conversely, the success rate dips during the summer months of July and August, a period typically associated with a slowdown in business activity. This pattern reflects a correlation between investment decisions and established fiscal and seasonal trends.
The Success Rate by Season indicates an upward trajectory, with a higher percentage of pitches securing deals in later seasons of the show. This increase suggests an increase in entrepreneurs and also the quality of their presentations have gotten more effective over time, or that the investors are more inclined to engage in deals with entrepreneurs indicating a rise in confidence as the series evolves. The variations seen in certain seasons reflect the dynamic nature of investment and the ever-changing strategies of both the pitchers and sharks. The data demonstrates a clear enhancement in the show’s capacity to facilitate successful investments as seasons advance.
The Average Yearly Viewership trend shows a peak in 2014 signifies Shark Tank's apex in popularity during its initial years, capturing a large television audience. The drop in viewership since then aligns with the massive shift in audience preferences towards on-demand streaming services, ever since COVID and the diversification of entertainment options, which has impacted traditional TV ratings across the board. Yet, the show's enduring presence confirms its core appeal and the loyalty of its audience. Now, interesting pitches from the show are now being posted in YouTube as clips and Shark Tank has even partnered with streaming platforms like Hulu and Amazon Prime indicating they have swiftly adapted to this audience shift.
December emerges as the most successful month, likely influenced by fiscal year-end and holiday factors. The seasonal success rate shows an upward trend over seasons, suggesting a correlation between pitch quality and signing success. However, the peak viewership in 2014 declines with the rise of on-demand streaming, emphasizing the need to adapt to evolving viewer preferences.
Are there statistically significant co-investment patterns among the sharks, revealing insights into their investment strategies and price discrimination?
Let's start off by filtering all the startups that received a deal from atleast one shark. This operation can be performed by using the 'Got Deal' variable, which has a value of 1 when a startup has received a deal.
#Finding all companies that got investment deals
df_deal=df_st1[df_st1['Got Deal']==1]
Descriptive Statistics
To understand the basic composition of the data we have filtered, we shall look at some basic statistics
#Descriptive Statistics for Investments by Investor
df_desc_stat=df_deal[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].describe()
df_desc_stat.applymap('{:,.2f}'.format)
| Barbara Corcoran Investment Amount | Mark Cuban Investment Amount | Lori Greiner Investment Amount | Robert Herjavec Investment Amount | Daymond John Investment Amount | Kevin O Leary Investment Amount | Guest Investment Amount | |
|---|---|---|---|---|---|---|---|
| count | 118.00 | 229.00 | 198.00 | 121.00 | 111.00 | 116.00 | 104.00 |
| mean | 148,644.07 | 257,748.18 | 215,370.37 | 292,539.94 | 182,430.93 | 240,021.55 | 213,854.17 |
| std | 116,200.37 | 279,351.66 | 209,730.35 | 538,725.92 | 297,866.27 | 301,853.38 | 212,171.45 |
| min | 12,500.00 | 12,500.00 | 17,500.00 | 5,000.00 | 5,000.00 | 20,000.00 | 20,000.00 |
| 25% | 50,000.00 | 75,000.00 | 75,000.00 | 100,000.00 | 50,000.00 | 82,500.00 | 75,000.00 |
| 50% | 100,000.00 | 150,000.00 | 150,000.00 | 187,500.00 | 120,000.00 | 150,000.00 | 125,000.00 |
| 75% | 200,000.00 | 300,000.00 | 268,750.00 | 300,000.00 | 215,000.00 | 255,000.00 | 250,000.00 |
| max | 700,000.00 | 2,000,000.00 | 1,175,000.00 | 5,000,000.00 | 3,000,000.00 | 2,500,000.00 | 1,250,000.00 |
To find any co-investment patterns, let\'s first separate the investments made by each shark into a new dataframe.
#Finding All Deals Invested by a particular investor
df_barb=df_deal[df_deal['Barbara Corcoran Investment Amount'].notnull()]
df_mark=df_deal[df_deal['Mark Cuban Investment Amount'].notnull()]
df_lori=df_deal[df_deal['Lori Greiner Investment Amount'].notnull()]
df_rob=df_deal[df_deal['Robert Herjavec Investment Amount'].notnull()]
df_kev=df_deal[df_deal['Daymond John Investment Amount'].notnull()]
df_daym=df_deal[df_deal['Kevin O Leary Investment Amount'].notnull()]
df_guest=df_deal[df_deal['Guest Investment Amount'].notnull()]
Now that we have separated the investments made by each shark into a separate dataframe, we shall create sets which contain the names of companies each shark has invested in
#Creating sets to find what companies an investor has invested in
inv_barb=set(df_barb.index)
inv_mark=set(df_mark.index)
inv_lori=set(df_lori.index)
inv_rob=set(df_rob.index)
inv_kev=set(df_kev.index)
inv_daym=set(df_daym.index)
inv_guest=set(df_guest.index)
Based on the number of sharks in the deal, we shall now see which categories have significant amount of data to compare and find any co-investment patterns
#Creating and Displaying a tree map to represent various numbers of sharks in deals
grouped_by_no_of_deals=df_deal.groupby(df_deal['Number of sharks in deal'])
for name,group in grouped_by_no_of_deals:
print(f'{len(group)} companies that have a {name} of sharks that have co-invested in them')
564 companies that have a 1.0 of sharks that have co-invested in them 170 companies that have a 2.0 of sharks that have co-invested in them 17 companies that have a 3.0 of sharks that have co-invested in them 3 companies that have a 4.0 of sharks that have co-invested in them 6 companies that have a 5.0 of sharks that have co-invested in them
We can see that there are only 4 categories for us to consider for co-inestment patterns.
The deals with just 1 shark don't have any co-investments and can be ignored.
We shall be looking into each of these categories to identify the top co-investors
#Creating New DataFrames to hold deals with 5,4 and 3 sharks respectively
df_5_sharks=df_deal[df_deal['Number of sharks in deal']==5]
df_4_sharks=df_deal[df_deal['Number of sharks in deal']==4]
df_3_sharks=df_deal[df_deal['Number of sharks in deal']==3]
Let's now understand how we are planning to find the co-investments made by each shark.
Since we have separate dataframes where all these investors have invested and also sets of company names separately, we shall use these to perform data transformation operations and understand the co-investment patterns.
Performing a simple '&' operation on the sets will give us a union. We can use this information to determine all the companies two investors have co-invested in.
We will need to find all combinations of sets to understand any patterns in investments, therefor we shall take the help of the 'itertools' library.
# importing itertools.combinations to find various combinations in sets
from itertools import combinations
#Defining value sets and dictionaries to resolve key value pairs
list_sets_names=['inv_barb','inv_mark','inv_lori','inv_rob','inv_daym','inv_kev','inv_guest']
list_of_sets=[inv_barb,inv_lori,inv_mark,inv_daym,inv_rob,inv_kev,inv_guest]
dict_names={'inv_barb':'Barbara Corcoran',
'inv_mark':'Mark Cuban',
'inv_lori':'Lori Greiner',
'inv_rob':'Robert Harjavec',
'inv_daym':'Daymond John',
'inv_kev':'Kevin O Leary',
'inv_guest':'Guest'}
Since we know that there are co-investments only by 5 sharks at most at any given time, we shall start off at co-investments by any group of 5 sharks.
Let's try to understand which group of sharks huddled together the most and co-invested.
To compute the group of sharks who co-invested together a lot, let's write a method which will calculate all sets of companies a group of 'n' sharks have invested in and find out the groups which have co-invested in the most number of companies.
'''Method max_comb(n,list_sets,dicts)
METHOD TO FIND THE LIST WITH SPECIFIC NUMBER OF INVESTORS WITH MAXIMUM NUMBER OF CO-INVESTMENTS
n -- THE NUMBER OF SHARKS WHO HAVE CO-INVESTED
list_sets -- LIST OF SETS WHERE EACH SHARK HAS INVESTED
dicts -- DICTIONARY TO RESOLVE WHICH SET BELONG TO WHICH SHARK
return
max_res -- THE MAXIMUM NUMBER OF CO-INVESTMENTS DONE BY A GROUP
max_set_str -- A LIST OF SET OF INVESTORS WHO HAD THE MAXIMUM NUMBER OF CO-INVESTMENTS
set_investments_to_update -- A SET OF INVESTMENTS THAT HAVE TO REMOVED FROM SETS
'''
def max_comb( n,list_sets,dicts):
max_res=0
max_set_str=[]
set_investments_to_update=set()
#Finding all combinations from the list of sets with n selections
list_comb=list(itertools.combinations(list_sets, r=n))
for each in list_comb:
#Finding out all the companies a combination of investors have co-invested in
this_set=eval('&'.join(each))
# Adding companies to final set to be updated
set_investments_to_update.update(this_set)
# Assigning maximum number and list of investors who have co-invested
if len(this_set)>max_res:
max_res=len(this_set)
max_set_str=[]
max_set_str.append(', '.join(list(dicts[i] for i in each )))
elif len(this_set)==max_res:
max_set_str.append(', '.join(list(dicts[i] for i in each )))
return max_res,max_set_str,set_investments_to_update
Logically, any company in which 5 sharks have invested will also be considered when we check for companies where 4 sharks have co-invested. We have to update the investment sets to make these cases mutually exclusive.
Let's write a function to update the sets with all the companies that have already been accounted for in a previous analysis
'''METHOD update_investments(list_to_upd)
METHOD TO UPDATE INVESTMENT SETS WITH LARGER NUMBER OF CO-INVESTORS TO GIVE ACCURATE SETS FOR SMALLER CO-INVESTMENTS
list_to_upd -- LIST OF INVESTMENT TO BE UPDATED IN INVESTMENT SETS,
SO ANY INVESTMENTS THAT HAVE BEEN ACCOUNTED FOR IN PREVIOUS ANALYSES DON'T EXAGERRATE CURRENT ANALYSES
return None
'''
def update_investments(list_to_upd):
for each in list_of_sets:
for i in list_to_upd:
#Update sets to discard investments
each.discard(i)
To make our lives easier, let's also write a function which shall display our analysis in a readable format
'''METHOD print_max_comb(n,list_sets,dicts)
n -- THE NUMBER OF SHARKS WHO HAVE CO-INVESTED
list_sets -- LIST OF SETS WHERE EACH SHARK HAS INVESTED
dicts -- DICTIONARY TO RESOLVE WHICH SET BELONG TO WHICH SHARK
METHOD TO PRINT MAXIMUM COMBINATION OF INVESTORS IN A READABLE FORMAT
return set_investments_to_update CASCADING A SET OF INVESTMENTS THAT HAVE TO REMOVED FROM SETS FROM max_comb METHOD
'''
def print_max_comb(n,list_sets,dicts):
number,investor_list,set_investments_to_update=max_comb(n,list_sets,dicts)
print(f'The following groups of investors have co-invested in {number} investment(s):\n')
for i,each in enumerate(investor_list):
print(i+1,'. ',each)
print('\n\nPlease note that they are the sole investors in the above stated startups')
return set_investments_to_update
df_5_sharks[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].style.applymap(lambda x: "background-color: #9DE09E" if x>0 else "background-color: None")
| Barbara Corcoran Investment Amount | Mark Cuban Investment Amount | Lori Greiner Investment Amount | Robert Herjavec Investment Amount | Daymond John Investment Amount | Kevin O Leary Investment Amount | Guest Investment Amount | |
|---|---|---|---|---|---|---|---|
| Startup Name | |||||||
| ClassroomJams | 50000.000000 | 50000.000000 | nan | 50000.000000 | 50000.000000 | 50000.000000 | nan |
| BuggyBeds | 50000.000000 | 50000.000000 | nan | 50000.000000 | 50000.000000 | 50000.000000 | nan |
| Breathometer | nan | 200000.000000 | 200000.000000 | 200000.000000 | 200000.000000 | 200000.000000 | nan |
| XCraft | nan | 300000.000000 | 300000.000000 | 300000.000000 | 300000.000000 | 300000.000000 | nan |
| CupBoardPro | nan | 20000.000000 | 20000.000000 | nan | 20000.000000 | 20000.000000 | 20000.000000 |
| Eyewris | 25000.000000 | 25000.000000 | 25000.000000 | nan | 25000.000000 | 25000.000000 | nan |
Table 10.2.1: Tabular representation of the companies where 5 sharks have co-invested together.
update_investments(print_max_comb(5,list_sets_names,dict_names))
The following groups of investors have co-invested in 2 investment(s): 1 . Barbara Corcoran, Mark Cuban, Robert Harjavec, Daymond John, Kevin O Leary 2 . Mark Cuban, Lori Greiner, Robert Harjavec, Daymond John, Kevin O Leary Please note that they are the sole investors in the above stated startups
df_4_sharks[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].style.applymap(lambda x: "background-color: #9DE09E" if x>0 else "background-color: none")
| Barbara Corcoran Investment Amount | Mark Cuban Investment Amount | Lori Greiner Investment Amount | Robert Herjavec Investment Amount | Daymond John Investment Amount | Kevin O Leary Investment Amount | Guest Investment Amount | |
|---|---|---|---|---|---|---|---|
| Startup Name | |||||||
| CoffeeJoulies | nan | nan | 37500.000000 | 37500.000000 | 37500.000000 | 37500.000000 | nan |
| BeeD'Vine | nan | 187500.000000 | 187500.000000 | 187500.000000 | nan | nan | 187500.000000 |
| Songlorious | nan | 125000.000000 | nan | nan | 125000.000000 | 125000.000000 | 125000.000000 |
Table 10.2.2: Tabular representation of the companies where 4 sharks have co-invested together.
update_investments(print_max_comb(4,list_sets_names,dict_names))
The following groups of investors have co-invested in 1 investment(s): 1 . Mark Cuban, Lori Greiner, Robert Harjavec, Guest 2 . Mark Cuban, Daymond John, Kevin O Leary, Guest 3 . Lori Greiner, Robert Harjavec, Daymond John, Kevin O Leary Please note that they are the sole investors in the above stated startups
df_3_sharks[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].style.applymap(lambda x: "background-color: #9DE09E" if x>0 else "background-color: none")
| Barbara Corcoran Investment Amount | Mark Cuban Investment Amount | Lori Greiner Investment Amount | Robert Herjavec Investment Amount | Daymond John Investment Amount | Kevin O Leary Investment Amount | Guest Investment Amount | |
|---|---|---|---|---|---|---|---|
| Startup Name | |||||||
| Soy-Yer-Dough | nan | nan | nan | 100000.000000 | 100000.000000 | 100000.000000 | nan |
| FirstDefenseNasalScreen | nan | 250000.000000 | nan | 250000.000000 | 250000.000000 | nan | nan |
| M3GirlDesigns | nan | 100000.000000 | 100000.000000 | 100000.000000 | nan | nan | nan |
| VelocitySigns | nan | 75000.000000 | nan | 75000.000000 | nan | 75000.000000 | nan |
| PittMoss | nan | 200000.000000 | nan | 200000.000000 | nan | 200000.000000 | nan |
| SharkWheel | nan | 75000.000000 | nan | 75000.000000 | nan | nan | 75000.000000 |
| SarahOliverHandbags | nan | nan | 83333.333330 | 83333.333330 | nan | 83333.333330 | nan |
| CombatFlipFlops | nan | 100000.000000 | 100000.000000 | nan | 100000.000000 | nan | nan |
| BeeFreeHonee | 70000.000000 | 70000.000000 | nan | nan | nan | nan | 70000.000000 |
| Goverre | nan | 66666.666670 | 66666.666670 | 66666.666670 | nan | nan | nan |
| QBall | nan | 100000.000000 | 100000.000000 | nan | nan | nan | 100000.000000 |
| Grypmat | nan | 120000.000000 | 120000.000000 | nan | nan | nan | 120000.000000 |
| SnapClips | nan | 50000.000000 | 50000.000000 | nan | nan | nan | 50000.000000 |
| Aira | nan | nan | 166666.666700 | 166666.666700 | nan | 166666.666700 | nan |
| SafetyNailer | nan | 33333.333330 | 33333.333330 | nan | nan | nan | 33333.333330 |
| FlaskyFlowers | nan | 25000.000000 | 25000.000000 | nan | nan | 25000.000000 | nan |
| Browndages | nan | 33333.333000 | 33333.333000 | nan | 33333.333000 | nan | nan |
Table 10.2.3: Tabular representation of the companies where 3 sharks have co-invested together.
update_investments(print_max_comb(3,list_sets_names,dict_names))
The following groups of investors have co-invested in 4 investment(s): 1 . Mark Cuban, Lori Greiner, Guest Please note that they are the sole investors in the above stated startups
name_list=['Barbara','Lori','Mark','Daymond','Robert','Kevin','Guest']
main_list=[]
for n,i in enumerate(list_of_sets):
sum_per_investor=0
sub_list=[]
for m,j in enumerate(list_of_sets):
if i!=j:
combined_set=i&j
sub_list.append(len(combined_set))
sum_per_investor+=len(combined_set)
else:
sub_list.append(0)
main_list.append(sub_list)
df_2_sharks=pd.DataFrame(main_list,columns=name_list)
df_2_sharks.index=name_list
styled_data = df_2_sharks.style.background_gradient(cmap='Blues',axis=None)
styled_data
| Barbara | Lori | Mark | Daymond | Robert | Kevin | Guest | |
|---|---|---|---|---|---|---|---|
| Barbara | 0 | 1 | 20 | 4 | 2 | 5 | 4 |
| Lori | 1 | 0 | 31 | 13 | 10 | 3 | 19 |
| Mark | 20 | 31 | 0 | 6 | 7 | 8 | 17 |
| Daymond | 4 | 13 | 6 | 0 | 9 | 1 | 5 |
| Robert | 2 | 10 | 7 | 9 | 0 | 3 | 1 |
| Kevin | 5 | 3 | 8 | 1 | 3 | 0 | 1 |
| Guest | 4 | 19 | 17 | 5 | 1 | 1 | 0 |
Please note that the above table is just a dataframe output, but styled for better understanding.
update_investments(print_max_comb(2,list_sets_names,dict_names))
The following groups of investors have co-invested in 31 investment(s): 1 . Mark Cuban, Lori Greiner Please note that they are the sole investors in the above stated startups
Diving Deep into Mark & Lori's Co-investments:
Let's look into the Co-investments of Mark and Lori, and try to understand if there are any patterns that we can observe.
Do perform this analysis, we shall first merge all the investments made by the two investors
#Merging Invesments Made by Mark and Lori
df_mark_lori=df_mark.reset_index(drop=True).merge(df_lori.reset_index(),how='inner').set_index('Startup Name')
Since we have all the investments made by them, let's see if they have co-invested heavily in any Industry.
We shall group the investments by Industry and find out what percentage of all the investments they made in that industry were co-investments with the other person.
#Grouping Investments by Industry
mark_grouped=df_mark.groupby('Industry')
lori_grouped=df_lori.groupby('Industry')
grouped = df_mark_lori.groupby('Industry')
Since we have the grouped objects now, let's unpack them into lists
#Creating Iterables from groupby results
name,group =zip(*grouped)
name_mark,mark_group=zip(*mark_grouped)
name_lori,lori_group=zip(*lori_grouped)
The iterable objects need to be matched by Industry and using DataFrames for the same would make it very easy.
Let's convert all the iterables into DataFrames and merge them together using Industry as a reference
#Making Dataframes and merging them to make the output dataframe
df_coinv=pd.DataFrame(columns=('Industry','Co-Investments'),data=grouped)
df_coinv['Co-Investments']=df_coinv['Co-Investments'].apply(len)
df_markinv=pd.DataFrame(columns=('Industry','Mark Investments'),data=mark_grouped)
df_markinv['Mark Investments']=df_markinv['Mark Investments'].apply(len)
df_loriinv=pd.DataFrame(columns=('Industry','Lori Investments'),data=lori_grouped)
df_loriinv['Lori Investments']=df_loriinv['Lori Investments'].apply(len)
df_model_op=pd.merge(df_markinv,df_loriinv, on='Industry').merge(df_coinv, on='Industry')
Now, we have the number of Investments made by each investor individually and the number of co-investments made in each Industry.
Let's see which investor has made more co-investments in each Industry than Individual Investments. This will tell us if they have tended to co-invest more in any Industry.
With the values we already have , let's derive what percentage of investments each investor has made are co-investments.
#Making derived columns
df_model_op['Co-investment Percentage of Mark']=df_model_op['Co-Investments']/df_model_op['Mark Investments']*100
df_model_op['Co-investment Percentage of Lori']=df_model_op['Co-Investments']/df_model_op['Lori Investments']*100
Since we now have the required values, let's discard the columns we used to calculate these values and also make the output more readable
#Dropping Columns which were used to calculate derived values
df_model_op.drop(columns=['Mark Investments','Lori Investments'],inplace=True)
#Setting Index for better readability
df_model_op.set_index('Industry',inplace=True)
# Define Indices and Bar Width
bar_width = 0.35
index = np.arange(len(df_model_op.index))
# Plot the Bars
plt.figure(figsize=(12, 6))
bar1 = plt.bar(index - bar_width/2, df_model_op['Co-investment Percentage of Mark'], bar_width, label='Mark')
bar2 = plt.bar(index + bar_width/2, df_model_op['Co-investment Percentage of Lori'], bar_width, label='Lori')
# Add labels and title
plt.xlabel('Industry')
plt.ylabel('Co-investment Percentage')
plt.title('Co-investment Percentage by Industry')
plt.xticks(ticks=index,labels = df_model_op.index, rotation=45, ha='right') # Rotate x-axis labels for better readability
#Add annotations to highlight anomalies
plt.annotate('Mark\'s Co-Investment Percentage is higher', xy =(7-bar_width/2, 44),
xytext =(3, 50),
arrowprops = dict(facecolor ='black',
shrink = 0.05),)
plt.annotate('Mark\'s Co-Investment Percentage is higher', xy =(2-bar_width/2, 10),
xytext =(3, 50),
arrowprops = dict(facecolor ='black',
shrink = 0.05),)
plt.legend()
# Show the plot
plt.show()
We can observe that Lori has a higher percentage of co-investment with mark in all industries, except Lifestyle/Home and Children/Education.
The difference in co-investment pattern in the Children/Education Industry is meagre and can be safely ignored as it is statistically insignificant.
Whereas, we can see that Mark has a higher co-invesment percentage with Lori specifically in the Lifestyle/Home sector.
Lori Greiner is often reffered to as the Queen of QVC. Her area of expertise includes products from the Lifestyle/Home Industry. This has led to Mark Cuban trusting her Industry knowledge and co-investing with her in several occassions when startups from the Lifestyle/Home Industry appeared on the show.
Figure 10.2: Bar-chart based distribution of the co-investment percentages for Mark Cuban and Lori Greiner across different industries.
Sharks exhibit co-investment tendencies, indicating collaboration and shared interests among specific pairs. Instances of price discrimination reveal variations in investment preferences, where some sharks favour higher valuations. These findings offer entrepreneurs strategic insights into navigating the diverse investment landscape of the show.
Can we analyze trends in pitchers on the show to identify sectors with the highest capital raised and assess how investment trends impact the business pitches?
This research question, approached from the perspective of entrepreneurs, aims to assist them in identifying the most suitable industry for entrepreneurship. The targeted sector should be capable of attracting sufficient investment and demonstrating potential for future development over time.
# Get all the pitch with successful deals
df_deal = df_shark_tank_1[df_shark_tank_1['Got Deal'] == 1]
# Sum all the deal amount by industry
sum_values = df_deal.groupby('Industry').sum(numeric_only=True).reset_index()
sum_values.sort_values(by=['Total Deal Amount'],ascending=False,inplace=True)
# Dropping unrelated columns
sum_sharks = sum_values[['Industry','Barbara Corcoran Investment Amount','Barbara Corcoran Investment Equity',
'Mark Cuban Investment Amount','Mark Cuban Investment Equity',
'Lori Greiner Investment Amount','Lori Greiner Investment Equity',
'Robert Herjavec Investment Amount','Robert Herjavec Investment Equity',
'Daymond John Investment Amount','Daymond John Investment Equity',
'Kevin O Leary Investment Amount','Kevin O Leary Investment Equity']]
# Calculate the average deal amount by industry
avg_values = df_deal.groupby('Industry').mean(numeric_only=True).reset_index()
avg_values.sort_values(by=['Total Deal Amount'],ascending=False,inplace=True)
# Dropping unrelated columns
avg_sharks = avg_values[['Industry','Barbara Corcoran Investment Amount','Barbara Corcoran Investment Equity',
'Mark Cuban Investment Amount','Mark Cuban Investment Equity',
'Lori Greiner Investment Amount','Lori Greiner Investment Equity',
'Robert Herjavec Investment Amount','Robert Herjavec Investment Equity',
'Daymond John Investment Amount','Daymond John Investment Equity',
'Kevin O Leary Investment Amount','Kevin O Leary Investment Equity']]
The interactive visualization focuses on the distribution of investment across different industries. The dropbox on the top left corner allows users to switch between the chart of Average/Total deal amount vs. Industry and the Average deal equity vs. Industry.
# Create a new figure for question 3
fig3 = go.Figure()
# Generate List of investors
investors = ['Barbara Corcoran', 'Mark Cuban', 'Lori Greiner', 'Robert Herjavec', 'Daymond John', 'Kevin O Leary']
# Update figure layout, adding dropdown menu for showing different graphs
fig3.update_layout(updatemenus=[dict(buttons=list([dict(label='Average Deal Amount',method='update',
args=[{'y': [avg_sharks[f'{investor} Investment Amount'] for investor in investors],
'x':[avg_sharks['Industry']],
'type': 'bar',
'name': investors,
'barmode': 'stack'},
{'title': 'Average Deal Amount of each investor vs Industry'}]),
dict(label='Total Deal Amount',method='update',
args=[{'y': [sum_sharks[f'{investor} Investment Amount'] for investor in investors],
'x':[sum_sharks['Industry']],
'type': 'bar',
'name': investors,
'barmode': 'stack'},
{'title': 'Total Deal Amount of each investor vs Industry'}]),
dict(label='Average Deal Equity',method='update',
args=[{'y': [avg_sharks[f'{investor} Investment Equity'] for investor in investors],
'x':[avg_sharks['Industry']],
'type': 'bar',
'name': investors,
'barmode': 'stack'},
{'title': 'Average Deal Equity of each investor vs Industry'}])]),
direction='down',
showactive=True,
x=0.7, # Set the position of dropdown menu
y=1.1,
xanchor='left',
yanchor='top'
),
]
)
# Plot the initial bar chart
for investor in investors:
fig3.add_trace(go.Bar(x=avg_sharks['Industry'], y=avg_sharks[f'{investor} Investment Amount'],name=investor))
# Update the title, x labels, background color and size of the figure
fig3.update_layout(title='Average Deal Amount vs Industry',xaxis_tickangle=90, plot_bgcolor='white',width=1000,height=800)
# Ensure the bar plot is stacked
fig3.update_layout(barmode='stack')
fig3.show()
Figure 10.3: Interactively Stacked Bar-chart based distribution for the Average and Total Deal Amounts of each Investor, by different industries.
Inferences:
After analyzing the capital rasied by different sectors, we can identify some of the industries with top investment preferences and strongest market validation. Moreover, it is necessary to look into the success rate of each industries, as higher capital rasied doesn't necessarily mean higher chance for attracting investment.
# Group by 'Industry' and calculate the success rate
df_success_rate = pd.DataFrame(df_shark_tank_1[df_shark_tank_1['Got Deal'] == 1].groupby('Industry')['Got Deal'].count() / df_shark_tank_1.groupby('Industry')['Got Deal'].count()).reset_index()
# Convert the success rate to percentage format
df_success_rate['Success Deal Rate'] = df_success_rate['Got Deal'].apply(lambda x: f"{x * 100:.2f}%")
# Calculate the number of pitches for each industry
df_cat_count = df_deal.groupby('Industry')['Got Deal'].count().reset_index()
# Merge into one dataframe
df_cat_count = df_cat_count.merge(df_success_rate,on='Industry')
# Change names and drop unrelated columns
df_cat_count['Number of Pitches'] = df_cat_count['Got Deal_x']
df_cat_count.drop(['Got Deal_x','Got Deal_y'],inplace=True,axis=1)
# Show the results
df_cat_count
| Industry | Success Deal Rate | Number of Pitches | |
|---|---|---|---|
| 0 | Automotive | 76.47% | 13 |
| 1 | Business Services | 48.65% | 18 |
| 2 | Children/Education | 62.39% | 73 |
| 3 | Electronics | 40.00% | 6 |
| 4 | Fashion/Beauty | 56.22% | 122 |
| 5 | Fitness/Sports/Outdoors | 60.18% | 68 |
| 6 | Food and Beverage | 60.22% | 165 |
| 7 | Green/CleanTech | 54.55% | 6 |
| 8 | Health/Wellness | 60.00% | 39 |
| 9 | Lifestyle/Home | 66.67% | 150 |
| 10 | Liquor/Alcohol | 50.00% | 4 |
| 11 | Media/Entertainment | 62.50% | 15 |
| 12 | Pet Products | 58.00% | 29 |
| 13 | Software/Tech | 53.85% | 35 |
| 14 | Travel | 45.45% | 5 |
| 15 | Uncertain/Other | 66.67% | 12 |
From the table we can see that Lifestyle/Home has high success (66.67%) rates to attact shark's investment. It also have the second highest number of pitches (150), which indicates it is a prosperous field.
As for start-ups seeking a promising industry, we recommended that they can choose from Food and Berverage and Lifestyle/Home. However, this analysis does not provide a temporal aspect. Analyzing trends over seasons could reveal changes market dynamics and how the real world investment trends impact the business pitches.
# Filter the data to get only the data after 2019
df_best=df_deal[df_deal['Season Number'].isin([10,11,12,13,14])]
# Calculate the total deal amount by season
df_best=df_best.groupby(['Industry','Season End']).sum(numeric_only=True).reset_index()
# Drop unrelated columns
df_best=df_best[['Season End','Industry','Total Deal Amount']]
# Get only Food and Beverage and Lifestyle/Home from all industries
df_best=df_best[(df_best['Industry']=='Food and Beverage') | (df_best['Industry']=='Lifestyle/Home')]
df_best
| Season End | Industry | Total Deal Amount | |
|---|---|---|---|
| 23 | 2019-05-12 | Food and Beverage | 6060000.0 |
| 24 | 2020-05-15 | Food and Beverage | 3305000.0 |
| 25 | 2021-05-21 | Food and Beverage | 6736000.0 |
| 26 | 2022-05-20 | Food and Beverage | 5650000.0 |
| 27 | 2023-05-19 | Food and Beverage | 4585000.0 |
| 33 | 2019-05-12 | Lifestyle/Home | 2030000.0 |
| 34 | 2020-05-15 | Lifestyle/Home | 2895000.0 |
| 35 | 2021-05-21 | Lifestyle/Home | 3155000.0 |
| 36 | 2022-05-20 | Lifestyle/Home | 5325000.0 |
| 37 | 2023-05-19 | Lifestyle/Home | 4910000.0 |
Inferences:
Technology-related pitches consistently attract the highest capital, showcasing the influence of tech in entrepreneurship. The rise in sustainable and socially responsible ventures reflects a shift in conscious capitalism. Entrepreneurs can leverage this analysis to align their business ideas with sectors experiencing increased investor interest.
What factors in Business pitches influence the equity demands of sharks on the show and to what extent do these descriptions impact the likelihood of securing a deal?
#removing the records that did not have a business description
df_shark_tank_merged = df_shark_tank_merged.dropna(subset=['Pitched_Business_Desc'])
df_shark_tank_merged.head()
| Season Number | Season Start | Season End | Episode Number | Pitch Number | Original Air Date | Industry | Business Description | Pitchers Gender | Pitchers City | ... | Guest Name | Barbara Corcoran Present | Mark Cuban Present | Lori Greiner Present | Robert Herjavec Present | Daymond John Present | Kevin O Leary Present | Pitched_Business_Identifier | Pitched_Business_Desc | Pitched_Business_Identifier_m | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Name | |||||||||||||||||||||
| AvatheElephant | 1 | 2009-08-09 | 2010-02-05 | 1 | 1 | 2009-08-09 | Health/Wellness | Ava The Elephant - Baby and Child Care- (Emmy ... | Female | Atlanta | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Ava the Elephant | (Emmy the Elephant during show, trademarked a... | ava the elephant |
| Mr.Tod'sPieFactory | 1 | 2009-08-09 | 2010-02-05 | 1 | 2 | 2009-08-09 | Food and Beverage | Mr. Tod's Pie Factory - Specialty Food-a pie c... | Male | Somerset | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Mr. Tod's Pie Factory | a pie company | mr. tod's pie factory |
| Wispots | 1 | 2009-08-09 | 2010-02-05 | 1 | 3 | 2009-08-09 | Business Services | Wispots - Consumer Services-an electronic hand... | Male | Cary | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Wispots | an electronic hand-held device for waiting roo... | wispots |
| CollegeFoxesPackingBoxes | 1 | 2009-08-09 | 2010-02-05 | 1 | 4 | 2009-08-09 | Lifestyle/Home | College Foxes Packing Boxes - Consumer Service... | Male | Tampa | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | College Foxes Packing Boxes | a packing and organizing service based on an a... | college foxes packing boxes |
| IonicEar | 1 | 2009-08-09 | 2010-02-05 | 1 | 5 | 2009-08-09 | Software/Tech | Ionic Ear - Novelties-an implantable Bluetooth... | Male | St. Paul | ... | Unknown | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | Ionic Ear | an implantable Bluetooth device requiring surg... | ionic ear |
5 rows × 50 columns
Cleaning the Business Descriptions to prepare it for further Analyses
The aim is to simplify and clear up the descriptions, making them easier to understand and analyze. By removing unnecessary or repeated words and focusing on the main points, each business idea is presented in a simple and direct way. This is crucial because it helps to highlight what's unique and important about each business idea, without extra clutter and makes it ready for further analyses using Natural Language Processing.
def clean_desc(text):
text = remove_repeats(text)
lemmatizer = WordNetLemmatizer() #reduce words to the root form
words = text.split()
lemmatized_words = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))] #remove common english words
return ' '.join(lemmatized_words) #phrase a clean description
def remove_repeats(text):
sentences = re.split(r'[.!?]', text) #split on punctuations
unique_sentences = [] #creating a list to store the sentences
seen_sentences = set() #sentences that have been already looked at
for sentence in sentences:
sentence = sentence.strip() #remove whitespace
if sentence and sentence not in seen_sentences:
unique_sentences.append(sentence)
seen_sentences.add(sentence)
return '. '.join(unique_sentences).strip() + '.' if unique_sentences else '' #join the unique sentences
df_shark_tank_merged['Cleaned_Desc'] = df_shark_tank_merged['Pitched_Business_Desc'].apply(clean_desc)
df_shark_tank_merged['Cleaned_Desc']
Name
AvatheElephant (Emmy Elephant show, trademarked Ava after) pl...
Mr.Tod'sPieFactory pie company.
Wispots electronic hand-held device waiting rooms.
CollegeFoxesPackingBoxes packing organizing service based already succe...
IonicEar implantable Bluetooth device requiring surgery...
...
Wine&Design painting class wine served. Wine & Design prov...
Rocketbook notebook scan content cloud service via app er...
LaidBrand hair-care product made pheromones. Laid brand ...
BridalBuddy functional slip worn wedding gown allows weare...
FortMagic building construction toy.
Name: Cleaned_Desc, Length: 639, dtype: object
The cleaned text from the business pitches is now ready for further analysis. By examining key elements like the main words used, how easy the text is to read, and the overall tone and subjectivity of the descriptions, we can start to understand what the Sharks are looking look for in a Business Description that is presented to them.
This kind of analysis can help us figure out what parts of a business pitch are most important to the Sharks on the show. By studying these factors, we can get a better idea of how the way a business is described might affect its chances of success on the show.
Keyword Extraction
The objective is to extract key industry-specific terms from the business descriptions, identifying unique elements that might have contributed to their success in securing deals.
TF-IDF or Term Frequency-Inverse Document Frequency
is used here because it helps find important words in the business descriptions. It looks at how often a word appears in a pitch and how unique that word is compared to other pitches. This method is great for spotting special words that might have made the Startup get a deal from a Shark.
successful_desc = df_shark_tank_merged[df_shark_tank_merged['Got Deal'] == 1] #filter out successful pitch business descriptions
grouped_desc = successful_desc.groupby('Industry')['Cleaned_Desc'].apply(' '.join).reset_index() #output the keywords for specific industries
def extract_key5(TfidfVec, text, top_n=5): #only the frequent 5 keywords
res = TfidfVec.fit_transform([text]) #apply transformation
key_arr = np.array(TfidfVec.get_feature_names_out()) #extract the unique keywords
#If sklearn version is smaller than 0.24 x, get_feature_names is supposed to be used. If not, then get_features_names_out is supposed to be used
tf_sort = np.argsort(res.toarray()).flatten()[::-1] #sort by index and convert into one-dimensional array
top_key = key_arr[tf_sort][:top_n]
return top_key
TfidfVec = TfidfVectorizer() #initialize the object to assign which words are of more importance than others
grouped_desc['Keywords'] = grouped_desc['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text)) #apply the function
grouped_desc[['Industry', 'Keywords']]
| Industry | Keywords | |
|---|---|---|
| 0 | Automotive | [car, windows, prevents, light, drop] |
| 1 | Business Services | [waving, translator, small, sign, service] |
| 2 | Children/Education | [baby, kid, service, toy, gold] |
| 3 | Electronics | [ipad, lighting, sound, controlled, music] |
| 4 | Fashion/Beauty | [line, hair, clothing, the, make] |
| 5 | Fitness/Sports/Outdoors | [hand, the, designed, board, bike] |
| 6 | Food and Beverage | [free, made, wine, cheese, based] |
| 7 | Green/CleanTech | [shampoo, moss, peat, ball, use] |
| 8 | Health/Wellness | [posture, device, elephant, back, body] |
| 9 | Lifestyle/Home | [glass, drain, the, perfect, also] |
| 10 | Liquor/Alcohol | [beer, device, beverage, like, canned] |
| 11 | Media/Entertainment | [play, service, light, entertainment, super] |
| 12 | Pet Products | [dog, pet, fresh, patch, grass] |
| 13 | Software/Tech | [phone, service, app, drone, college] |
| 14 | Travel | [solar, powered, luggage, lighting, inflatable] |
| 15 | Uncertain/Other | [vehicle, use, suits, motorized, hydrant] |
Table 10.4.1: Tabular representation of the major keywords in the successful business pitches, across different industries.
The business descriptions from startups in the Automotive industry that secured deals on Shark Tank are rich with terms like car, windows, and light. These words reflect the essence of various pitches on the show, such as an innovative car window that dims automatically to reduce glare, or a lighting system designed for safer night driving. The sharks, recognizing the untapped potential of these fresh and transformative car technology ideas, are eager to invest, seeing the opportunity to tap into a continually expanding and innovation-hungry automotive market. Echoing this enthusiasm, Robert Herjavec sums up the sentiment perfectly: “Some people say, ‘I don’t know why I’m into cars,’ but for me, it was crystal clear." In a world where cars play such a central role in our lives, who doesn't find the prospect of automotive innovation exciting?
Similarly captivating is the realm of Health/Wellness, a sector that is of utmost importance to our lives: our well-being. In this sector, the business descriptions of the startups that got a deal on the show are marked by keywords like posture, device, body. These keywords hint at inventions that could revolutionize the way we approach personal health – from wearables designed to enhance posture to devices focused on improving overall body wellness. The sharks, perceptive to the increasing emphasis on health in our daily lives, see these pitches as more than just business ventures; they view them as gateways to improving human health and lifestyle. This alignment with the burgeoning health and wellness trend showcases the sharks' understanding that investing in health is investing in the future, a sentiment that resonates deeply in today's health-conscious society.
unsuccessful_desc = df_shark_tank_merged[df_shark_tank_merged['Got Deal'] == 0] #filter out unsuccessful pitch business descriptions
grouped_desc = unsuccessful_desc.groupby('Industry')['Cleaned_Desc'].apply(' '.join).reset_index()
grouped_desc['Keywords'] = grouped_desc['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text)) #apply the function
grouped_desc[['Industry', 'Keywords']]
| Industry | Keywords | |
|---|---|---|
| 0 | Automotive | [truck, rack, bed, invis, cargo] |
| 1 | Business Services | [service, funeral, become, men, planning] |
| 2 | Children/Education | [children, clothing, toy, fun, child] |
| 3 | Electronics | [device, mobile, service, virtual, headphones] |
| 4 | Fashion/Beauty | [clothing, hair, made, line, shirt] |
| 5 | Fitness/Sports/Outdoors | [device, fitness, shoes, bike, barefoot] |
| 6 | Food and Beverage | [wine, ice, made, glass, drink] |
| 7 | Green/CleanTech | [energy, us, grow, indoor, blade] |
| 8 | Health/Wellness | [device, medical, preparation, emergency, music] |
| 9 | Lifestyle/Home | [service, light, christmas, bed, sunscreen] |
| 10 | Liquor/Alcohol | [us, bad, brewery, idea, device] |
| 11 | Media/Entertainment | [music, act, magic, strip, las] |
| 12 | Pet Products | [dog, pet, way, cafe, dogs] |
| 13 | Software/Tech | [app, dating, estate, real, service] |
| 14 | Travel | [air, packed, hotel, service, day] |
| 15 | Uncertain/Other | [umbrella, service, room, rental, elephant] |
Table 10.4.2: Tabular representation of the major keywords in the unsuccessful business pitches, across different industries.
In the Green/CleanTech industry, pitches with words like energy and indoor showed a lot of ideas for new, environmentally-friendly technologies. But these did not always catch the sharks' interest—maybe because they were too specific or not quite ready to hit the big market.
For Pet Products, a lot of pitches talked about things for pets, using words like dog and cafe. The word "dog" popped up a lot, showing just how much we love our furry friends. But love wasn't enough to win over the sharks. These pet ideas often got lost in a sea of similar products, even with their appeal to pet lovers.
In the business ocean of Shark Tank, the investors are like the formidable sharks from the movie "Jaws" - discerning, powerful, and always on the lookout for a compelling opportunity. Just as the shark in "Jaws" navigated the waters with purpose, the sharks here circle around pitches, ready to pounce on those that show real promise. In the realms of Green/CleanTech and Pet Products, a pitch needs more than just a creative splash; it must create significant waves to truly capture the sharks' attention. Without the sharp bite of market potential or the thrilling innovation to make a deep impact, a pitch risks being left behind in these competitive waters. After all, these sharks are not just in for a leisurely swim—they're hunting for the most lucrative catch in the vast ocean of business opportunities.
Industry Specific Analysis
In-depth analysis of a specific industry provides critical insights that a broad overview of all industries may miss. This focused approach allows for a thorough examination of the unique dynamics within a specific industry. This type of analysis reveals not only the nuances of investor preferences specific to that industry, but also the nuances of entrepreneurial strategies that succeed within that space. Furthermore, this focused approach aids in the identification of industry-specific trends that may influence investor decisions.
thumb_up_url = 'https://github.com/JoyceGaoH/project-shark/blob/main/up.jpg?raw=true'
thumb_down_url = 'https://github.com/JoyceGaoH/project-shark/blob/main/down.jpg?raw=true'
thumb_up_response = requests.get(thumb_up_url)
thumb_down_response = requests.get(thumb_down_url)
thumb_up_mask = np.array(Image.open(BytesIO(thumb_up_response.content)))
thumb_down_mask = np.array(Image.open(BytesIO(thumb_down_response.content)))
def extract_key20(text, top_n=20): #similar to the earlier fucntion, except for the number of keywords, since this is specific to the industry
tf_vect = TfidfVectorizer(stop_words='english')
tf_matrix = tf_vect.fit_transform([text])
key_arr = np.array(tf_vect.get_feature_names_out())
tf_sort = np.argsort(tf_matrix.toarray()).flatten()[::-1]
return ' '.join(key_arr[tf_sort][:top_n])
def create_wc(text, title, mask=None):
wordcloud = WordCloud(mask=mask, background_color='white', contour_width=1, contour_color='black').generate(text)
plt.imshow(wordcloud, interpolation='bilinear') #rendering for smoother appearance
plt.axis('off') #axis not needed for a word cloud
plt.title(title)
def process_desc(deal_status, industry, title, mask): #generalized function to choose any industry
desc = df_shark_tank_merged[(df_shark_tank_merged['Got Deal'] == deal_status) &
(df_shark_tank_merged['Industry'] == industry)]
combined_desc = ' '.join(desc['Cleaned_Desc'])
keywords = extract_key20(combined_desc)
create_wc(keywords, title, mask) #creating the word cloud for the industry
Why have we chosen the Lifestyle/Home Industry?
Based on our conclusion from the previous question, the decision to focus on the Lifestyle/Home industry for specific analysis is well-founded, especially given the significant shift in investment patterns observed after 2020. The pandemic caused significant changes in consumer behavior and priorities, resulting in a renewed emphasis on home and lifestyle products.
With more people spending time at home during the pandemic era, there has been a surge in demand for products that improve home living, from comfort and convenience to home office setups and leisure. The Sharks, who are always on the lookout for emerging market trends and consumer needs, most likely smeeled this demand spike.
plt.figure(figsize=(10, 8)) #create subplots
plt.subplot(1, 2, 1)
process_desc(1, 'Lifestyle/Home', 'Successful Lifestyle/Home Descriptions', thumb_up_mask) #filled in the chosen industry name
plt.subplot(1, 2, 2)
process_desc(0, 'Lifestyle/Home', 'Unsuccessful Lifestyle/Home Descriptions', thumb_down_mask)
plt.show()
Figure 10.4.1: Wordcloud - based representation of the major keywords in the successful and unsuccessful business pitches.
Successful pitches in this category commonly feature words like perfect, easy, collapsible and magnetic painting a picture of products that bring innovation into the home by marrying convenience with clever design. The Sharks much like us are drawn to products that promise to simplify life's daily tasks, offering practical solutions that cater to the modern, efficiency-seeking consumer.
On the flip side, unsuccessful pitches are peppered with terms such as service, climate, device and christmas which implies a more specialized or seasonal appeal. These products while still being a utility, their potential for year-round demand or broad market applicability might be limited. The sharks, known for their pragmatism, keen sense of market trends and consumer behavior, might be less inclined to invest in products that do not offer a clear, year-round value proposition or have a narrower target audience.
Hypothetical Scenario
Let's consider a hypothetical scenraio to better understand what this Word Cloud means, consider a pitch for a product called SnapFold, a breakthrough that incorporates the words perfect, easy, collapsible, and magnetic. SnapFold could be a collapsible, space-saving home organization system with magnetic attachments for versatility. This is the type of innovation that works well in the Lifestyle/Home category on Shark Tank. It's a product designed to simplify daily life, appealing to customers looking for efficiency and order in their homes. Recognizing the universal appeal of a product like SnapFold, sharks would be drawn to its broad market applicability and year-round sales future potential. This matches their preference for products that solve everyday problems and have an extensive customer base.
Conversely, imagine a product like Golden Hour, a device for enhancing home ambiance during specific times like Christmas. While Golden Hour may be appealing during certain seasons, its limited year-round use may make it less appealing for investment. Regardless of how innovative or appealing a product is during the holiday season, it may not meet the sharks' criteria for an all-season, broad-market product. Using their business acumen, the sharks frequently seek products with the versatility and appeal to generate consistent sales throughout the year. This pragmatic approach reflects their understanding of market trends and consumer preferences, with the goal of investing in products that provide long-term growth and profitability rather than seasonal or niche market spikes.
Although our research indicates that some keywords are more frequently used in winning business pitches on the show, this does not guarantee that utilizing these keywords alone will result in an investment. Even if two startups use the same keywords in their descriptions, their results may still differ. This indicates that using the right keywords is not the only factor in success on the show.
Let's check if this is the case.
lifestyle_data = df_shark_tank_merged[df_shark_tank_merged['Industry'] == 'Lifestyle/Home']
successful_pitches = lifestyle_data[lifestyle_data['Got Deal'] == 1].copy() #avoid warning
successful_pitches['Keywords'] = successful_pitches['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text))
unsuccessful_pitches = lifestyle_data[lifestyle_data['Got Deal'] == 0].copy()
unsuccessful_pitches['Keywords'] = unsuccessful_pitches['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text)) #applying the function
similar_startups = [] #creating a list
for index1, row1 in successful_pitches.iterrows():#iterate through each row in the dataframe
for index2, row2 in unsuccessful_pitches.iterrows():
common_keywords = set(row1['Keywords']).intersection(set(row2['Keywords'])) #find the common keywords
if len(common_keywords) > 1: #display startup pairs that had more than one common keyword
similar_startups.append({
'Successful Startup': row1['Pitched_Business_Identifier'],
'Unsuccessful Startup': row2['Pitched_Business_Identifier'],
'Common Keywords': common_keywords
})
for startup in similar_startups:
print(f"Successful Startup: {startup['Successful Startup']}, "
f"Unsuccessful Startup: {startup['Unsuccessful Startup']}, "
f"Common Keywords: {', '.join(startup['Common Keywords'])}")
Successful Startup: Sweep Easy, Unsuccessful Startup: CropSticks, Common Keywords: in, built Successful Startup: 180Cup, Unsuccessful Startup: ARKEG, Common Keywords: beer, double Successful Startup: GeekMyTree, Unsuccessful Startup: Eve Drop, Common Keywords: light, christmas
The results infact confirm the theory that getting a successful investment on the show is not always dependent on utilizing similar keywords in thier business descriptions. Examples of keywords that both Sweep Easy and CropSticks had in common are built and in, but only one of them succeeded. Comparably, 180Cup and ARKEG shared keywords like double and beer, and only one of them could pop open the bottle and GeekMyTree and Eve Drop shared keywords like Christmas and light, and only one of them could actually get to celebrate Christmas.
This outcome highlights that although keywords are important for effectively communicating business ideas, they are not the only factor that determines whether investment pitches are successful.
Further examining the business descriptions' sentiment and readability, through an examination of the text's sentiment and readability, we attempt to gain a deeper understanding of other crucial elements in a business description that could impact an investor's choice.
Flesch Reading Ease
is used to assess the readability of business descriptions. It determines how simple or complex the language in a description is. The formula takes into account factors such as sentence length and the number of syllables per word. Higher scores indicate easier-to-read text, while lower scores indicate more complex language. This metric is particularly useful in this context for determining whether the clarity and simplicity of a startup's business description can influence its success in securing a deal with a Shark.
lifestyle_data = df_shark_tank_merged[df_shark_tank_merged['Industry'] == 'Lifestyle/Home'].copy()
#flesch reading ease formula to calculate the readability
def calc_read(text):
return textstat.flesch_reading_ease(text)
def calc_senti(text):
blob = TextBlob(text) #create an object
return blob.sentiment
lifestyle_data['Readability'] = lifestyle_data['Pitched_Business_Desc'].apply(calc_read) #apply the functions
lifestyle_data['Sentiment'] = lifestyle_data['Pitched_Business_Desc'].apply(calc_senti)
lifestyle_data['Polarity'] = lifestyle_data['Sentiment'].apply(lambda x: round(x.polarity, 2)) #rounding off to 2 decimal points
lifestyle_data['Subjectivity'] = lifestyle_data['Sentiment'].apply(lambda x: round(x.subjectivity, 2))
lifestyle_data_scores = lifestyle_data[['Got Deal', 'Readability', 'Polarity', 'Subjectivity']]
lifestyle_data_scores_1 = lifestyle_data_scores.sort_values('Got Deal', ascending=False).reset_index() #sort according to success
lifestyle_data_scores_1
| Name | Got Deal | Readability | Polarity | Subjectivity | |
|---|---|---|---|---|---|
| 0 | PeoplesDesign | 1 | 48.50 | 0.22 | 0.67 |
| 1 | Socktabs | 1 | 74.49 | -0.20 | 0.05 |
| 2 | OneLifeProducts | 1 | 55.91 | 0.00 | 0.00 |
| 3 | Insta-Fire | 1 | 62.17 | 0.11 | 0.22 |
| 4 | TheWallDoctoRX | 1 | 114.12 | 0.43 | 0.83 |
| ... | ... | ... | ... | ... | ... |
| 91 | WeddingWagon | 0 | 88.74 | 0.00 | 0.00 |
| 92 | TableJacks | 0 | 90.77 | 0.00 | 0.00 |
| 93 | StormStoppers | 0 | 56.25 | 0.43 | 0.73 |
| 94 | EveDrop | 0 | 75.10 | 0.13 | 0.54 |
| 95 | TheFloatingMugCo. | 0 | 73.81 | -0.01 | 0.74 |
96 rows × 5 columns
The scores are available, but what can we observe from a mere table? A visualization certainly helps a lot.
Scatterplots will allow us to discern potential patterns or correlations between the numerical scores assigned to each pitch and the outcome of the pitch (whether a deal was made).
For instance, by plotting 'Readability' against 'Got Deal', we can evaluate if more easily readable pitches tend to have a higher success rate. Similarly, scatterplots of 'Polarity' and 'Subjectivity' against 'Got Deal' could reveal if pitches with certain emotional tones or levels of personal opinion are more likely to secure an investment.
Scatterplot Visualization
fig = make_subplots(
rows=1, cols=3,
subplot_titles=('Readability vs Deal Success', #assign titles accordingly
'Polarity vs Deal Success',
'Subjectivity vs Deal Success')
)
fig.add_trace(
go.Scatter(x=lifestyle_data_scores_1['Got Deal'], y=lifestyle_data_scores_1['Readability'],
mode='markers', name='Readability', #display each point distinctly
text=lifestyle_data_scores_1['Name'],
hoverinfo='text+y'), #set what needs to be displayed when hovered over
row=1, col=1
)
fig.add_trace(
go.Scatter(x=lifestyle_data_scores_1['Got Deal'], y=lifestyle_data_scores_1['Polarity'],
mode='markers', name='Polarity',
text=lifestyle_data_scores_1['Name'],
hoverinfo='text+y'),
row=1, col=2
)
fig.add_trace(
go.Scatter(x=lifestyle_data_scores_1['Got Deal'], y=lifestyle_data_scores_1['Subjectivity'],
mode='markers', name='Subjectivity',
text=lifestyle_data_scores_1['Name'],
hoverinfo='text+y'),
row=1, col=3
)
fig.update_layout(
height=600, width=1000,
title_text='Analysis of Readability, Polarity, Subjectivity vs Deal Success',
showlegend=False #a legend is not required for this plot
)
for i in range(1, 4):
fig.update_xaxes(type='category', categoryarray=[0, 1], row=1, col=i) #set only 0 and 1 to appear on the x-axis
fig.show()
Figure 10.4.2: Scatterplot based distribution and analysis of readability, polarity and subjectivity scores of the pitches versus successful deals.
Readability vs Deal Success
The majority of points are clustered at the top with a readability score above 20, suggesting that higher readability may have a positive influence on the deal becoming a success. The presence of points across both spectrums of deal success at various readability scores, however, implies that while readability may be important, it is not the sole factor determining a deal's success.
Polarity vs Deal Success
Polarity scores, which indicate sentiment, are spread around the midpoint with a slight concentration of points with positive polarity. There's a mix of successful and unsuccessful deals across the range of polarity scores, which suggests that neither positive nor negative sentiment strongly predicts whether a deal will be successful.
Subjectivity vs Deal Success
Subjectivity scores are mostly positive, and there's a relatively even distribution across successful and unsuccessful deals. This indicates that subjectivity, or the presence of personal opinions in the pitch, does not show a clear correlation with the outcome of the deal.
Given these backgrounds, it's possible the high average viewership for the guests above is influenced by their individual successes, brand recognition, and the unique perspectives they bring to the entrepreneurial discussions on Shark Tank. Viewers may be drawn to these episodes due to the guests' established reputations and the potential for engaging and impactful business opportunities presented during their appearances.
Let's look at one last thing before concluding this,
Are there any cases where 2 startups that had the same Readability, Polarity and Subjectivity scores and ended up on either spectrum of success?
mixed_outcome_rows = [] #creat a dictionary
for index, row in lifestyle_data_scores.iterrows(): #iterate through each row
similar_rows = lifestyle_data_scores_1[(lifestyle_data_scores_1['Readability'] == row['Readability']) & #find the respective scores
(lifestyle_data_scores_1['Polarity'] == row['Polarity']) &
(lifestyle_data_scores_1['Subjectivity'] == row['Subjectivity'])]
#check if there are more than 1 similar score values and they had different Got Deal value
if len(similar_rows) > 1 and similar_rows['Got Deal'].nunique() > 1:
mixed_outcome_rows.extend(similar_rows.to_dict('records')) #add the rows into the dictionary
mixed_outcome_df = pd.DataFrame(mixed_outcome_rows).drop_duplicates() #drop duplicate rows after converting to dataframe
mixed_outcome_df
| Name | Got Deal | Readability | Polarity | Subjectivity | |
|---|---|---|---|---|---|
| 0 | ModMomFurniture | 1 | 34.59 | 0.0 | 0.0 |
| 1 | SustyParty | 0 | 34.59 | 0.0 | 0.0 |
| 2 | MonkeyMat | 1 | 32.56 | 0.0 | 0.0 |
| 3 | TheHeatHelper | 0 | 32.56 | 0.0 | 0.0 |
Oh well, not just 1 case, but 2, just within the Lifestyle/Home Industry Pitches. What are the odds?
ModMomFurniture and SustyParty—have identical readability scores yet divergent outcomes regarding deal success on "Shark Tank." Despite presenting their business descriptions with the same level of clarity (as indicated by the equal readability scores), one secured a deal while the other did not.
Similarly, MonkeyMat and TheHeatHelper share the same readability score, but again, one was successful in getting a deal, and the other wasn't. This outcome is intriguing because it challenges the assumption that the clarity of a pitch's description, as well as its sentiment and subjectivity—when controlled for—would have a consistent impact on investment decisions.
1 factor down, many to go!
We know that money is the crux of the show and technically, everything in life is. But is that the only factor that determines whether a deal goes through or not?
startup_det = ['Mod Mom Furniture', 'Susty Party']
selected_startups = df_shark_tank_merged[
df_shark_tank_merged['Pitched_Business_Identifier'].isin(startup_det) #check if the names are in the dataframe
][['Pitched_Business_Identifier', 'Original Ask Amount', 'Original Offered Equity', 'Valuation Requested']] #select the financial asks
selected_startups
| Pitched_Business_Identifier | Original Ask Amount | Original Offered Equity | Valuation Requested | |
|---|---|---|---|---|
| Name | ||||
| ModMomFurniture | Mod Mom Furniture | 90000 | 25.0 | 360000 |
| SustyParty | Susty Party | 250000 | 10.0 | 2500000 |
Mod Mom Furniture
Requested 90,000 Dollars for a 25% equity stake, valuing the business at 360,000 Dollars. This relatively modest ask suggests a smaller-scale operation or a startup in its earlier stages. The higher equity offering (25%) indicates a willingness to give up a significant share of the business, possibly reflecting the entrepreneur's need for substantial investment or strategic partnership.
Susty Party
Asked for a considerably higher amount of 250,000 Dollars, but only offered 10% equity, valuing the company at a substantial 2,500,000 Dollars. This higher valuation and lower equity offer suggest a more established business with potentially higher revenues or a more significant market presence. It reflects confidence in the business's value but also means the investor would get a smaller piece of the company for a higher amount of money.
Mod Mom Furniture's approach might appeal to sharks interested in a higher stake in an early-stage company. In this particular case, the financial asks play a part in influencing the decisions of the Sharks whether to invest or not in a particular Startup.
But is this the case for every Startup or are there any other factors upon which the Sharks change their decision to invest?
Let's take a closer look at one of the most iconic misses in Shark Tank's history till date.
Why did the Sharks choose not to invest in DoorBot?
Let's take a look at one of the biggest misses by the Shark's on the show till now, DoorBot, now known as Ring, is a typical example of a squandered opportunity that generated a sensation in the venture capital community, notably among the sharks on Shark Tank. When Jamie Siminoff proposed DoorBot in 2013, he departed with only a small investment from Kevin O'Leary, which he ultimately declined. Fast forward to today, and the rebranded Ring has become a ubiquitous presence in American homes, recognized for its smart doorbells that allow homeowners to see who's at the door from anywhere.
Despite failing to secure a deal on the show, Amazon's acquisition of Ring in 2018 for more than $1 billion made it into one of the most successful Shark Tank brands ever. This acquisition not only proved the product's business viability, but it also highlighted the sharks' unusual omission. While Mark Cuban has publicly declared that he has no regrets about not investing in DoorBot, considering the company's phenomenal success, it's difficult to imagine there isn't at least a twinge of regret. Ring's path from a rejected pitch to a household name serves as a powerful reminder of the volatile nature of startups and the acute eye required to recognize the diamond in the rough. It is a story that continues to captivate both aspiring entrepreneurs and investors, and it represents an important momemnt in Shark Tank history.
Was this decision down to the financial asks of DoorBot not compliant to the Shark's taste?
doorbot_det = df_shark_tank_merged[
df_shark_tank_merged['Pitched_Business_Identifier'] == 'DoorBot' #filter the details of the company DoorBot
][['Original Ask Amount', 'Original Offered Equity', 'Valuation Requested', 'Got Deal']]
doorbot_det
| Original Ask Amount | Original Offered Equity | Valuation Requested | Got Deal | |
|---|---|---|---|---|
| Name | ||||
| DoorBot | 700000 | 10.0 | 7000000 | 0 |
Previously, we had seen that Startups' that had the same identical Readability, Polarity and Subjectivity scores and still ended up on either side of the success spectrum. Is there a possibility of that in the financials case as well?
By examining similar pitches—those with financial terms within a 20% range of DoorBot's original ask amount and offered equity, the aim is to determine if DoorBot's financial ask was within a typical range for successfully funded pitches or if it was an outlier.
doorbot_ask = df_shark_tank_merged[df_shark_tank_merged['Pitched_Business_Identifier'] == 'DoorBot']['Original Ask Amount'].iloc[0]
doorbot_equity = df_shark_tank_merged[df_shark_tank_merged['Pitched_Business_Identifier'] == 'DoorBot']['Original Offered Equity'].iloc[0]
range_factor = 0.20 #define range as 20%
min_ask = doorbot_ask * (1 - range_factor) #20% less than DoorBot's ask
max_ask = doorbot_ask * (1 + range_factor) #20% more than DoorBot's ask
min_equity = doorbot_equity * (1 - range_factor)
max_equity = doorbot_equity * (1 + range_factor)
similar_fin_ask = df_shark_tank_merged[ #filter the similar descriptions
(df_shark_tank_merged['Industry'] == 'Lifestyle/Home') &
(df_shark_tank_merged['Original Ask Amount'].between(min_ask, max_ask)) &
(df_shark_tank_merged['Original Offered Equity'].between(min_equity, max_equity)) &
(df_shark_tank_merged['Got Deal'] == 1)
]
similar_fin_ask[['Original Ask Amount', 'Original Offered Equity', 'Valuation Requested', 'Got Deal']]
| Original Ask Amount | Original Offered Equity | Valuation Requested | Got Deal | |
|---|---|---|---|---|
| Name | ||||
| KeenHome | 750000 | 10.0 | 7500000 | 1 |
Voila! So, even the Financial Ask's are not a defining factor after all.
KeenHome and DoorBot—both pitched their ventures in the same ballpark of financial terms, with asks around 700,000 Dollars for 10% equity, valuing their companies at 7,000,000 Dollars. KeenHome successfully secured a deal while DoorBot did not, an outcome that may initially seem perplexing given the comparable asks and valuations presented. This scenario echoes the earlier contrast seen between ModMomFurniture and SustyParty, where identical readability scores led to different outcomes.
The striking difference in the fates of KeenHome and DoorBot, despite similar financial propositions, reiterates the multifaceted nature of investment decisions on "Shark Tank." It suggests that while financials are critical, they are not the sole determinant of success. Much like the Readability, Polarity and Subjectivity scores.
What other factors might affect the decision of the sharks if not for these 2?
startup_det = ['DoorBot', 'Keen Home'] #selecting the 2 startups
selected_startups = df_shark_tank_merged[
df_shark_tank_merged['Pitched_Business_Identifier'].isin(startup_det)
][['Season Number', 'Pitchers Gender', 'Pitchers City', 'Pitchers State', 'Multiple Entrepreneurs']] #compare other pitcher characteristics
selected_startups
| Season Number | Pitchers Gender | Pitchers City | Pitchers State | Multiple Entrepreneurs | |
|---|---|---|---|---|---|
| Name | |||||
| DoorBot | 5 | Male | Santa Monica | CA | 0 |
| KeenHome | 6 | Male | New York | NY | 1 |
From the Table,Distribution of Pitches by different States, we have seen that California has a higher success rate than New York, but yet KeenHome got an investment on the show, and DoorBot did not. Such is the nature of the show. There is no dfinitive factor. To further confirm this view, which is still an assumption. We will look at few more factors.
The next factor considered is whether different seasons of the show have been better for Startups from the Lifestyle and Home industry?
lifestyle_home_data = df_shark_tank_merged[df_shark_tank_merged['Industry'] == 'Lifestyle/Home']
seasonal_success_rate = lifestyle_home_data.groupby('Season Number')['Got Deal'].mean() #calculate the success rate
spec_succ_rate = seasonal_success_rate[seasonal_success_rate.index.isin([5, 6])] #filter only season 5&6
success_rate_df = spec_succ_rate.to_frame(name='Success Rate')
success_rate_df['Success Rate'] = success_rate_df['Success Rate'].apply(lambda x: f"{x * 100:.2f}%") #format for better readability
success_rate_df
| Success Rate | |
|---|---|
| Season Number | |
| 5 | 57.14% |
| 6 | 59.09% |
It is evident that the seasonal trend does not significantly correlate with the decision of the Sharks. Both startups, although featured in different seasons (DoorBot in Season 5 and KeenHome in Season 6), experienced similar success rates in their category, suggesting that the timing of their appearance on the show did not matter much.
Does the presence of Multiple Entrepreneurs play any role?
multi_ent_effect = lifestyle_home_data.groupby('Multiple Entrepreneurs')['Got Deal'].agg(
Total_Pitches='count', #count the total number of startups that had more than 1 person representing them on the show
Deals_Made='sum' #count the total number of deals that went through
)
multi_ent_effect['Success Rate'] = ((multi_ent_effect['Deals_Made'] / multi_ent_effect['Total_Pitches']) * 100).round(2).astype(str) + '%'
multi_ent_effect
| Total_Pitches | Deals_Made | Success Rate | |
|---|---|---|---|
| Multiple Entrepreneurs | |||
| 0 | 78 | 45 | 57.69% |
| 1 | 18 | 14 | 77.78% |
Having more than one person representing the startup seems to have a positive effect on the Sharks. But even with this, it cannot be concluded definitively that any startup that has more than one person representing them will walk away from the show with an investment.
Based on all of our analysis,
Shark Tank is a grand stage on which entrepreneurs and investors compete in a complex game. It's not just about who has the best idea or who requests the most money. Our quest to understand this game took us through a world of words, where we discovered which ones were frequently used in successful pitches. Words like 'perfect' and 'easy' appeared frequently, but even these magical words weren't always enough to guarantee success.
We also looked at how easy the pitches were to understand, how positive or negative they were, and how much they were based on facts versus opinions. Surprisingly, these things didn't always matter much. Even asking for the right amount of money, like DoorBot did, didn't always mean you'd get a deal. We saw that whether it was a man or a woman pitching, or where they were from, didn't make a big difference either.
In the end, it seems like thereis no single secret recipe for winning over the sharks. It is about a mix of things - having a great idea, presenting it well, and sometimes, just going with your gut feeling. For the sharks, it is not always about the numbers; it's also about the story behind the idea, the person who's pitching, and sometimes, just the excitement of the moment.
Valuation, realistic revenue forecasts, and effective negotiation skills significantly impact equity negotiations. Well-supported valuations and negotiation prowess contribute to securing more favourable deals. This information guides entrepreneurs in preparation and emphasizes the importance of strategic communication during the pitching process.
Does the presence of specific investors/guests on the show influence entrepreneurs' deal success and viewership, and who has the most significant impact on both?
# Correct guest names to show accurate counts
df_shark_tank_1['Guest Name'].replace('Daniel Lubetzsky', 'Daniel Lubetzky', inplace=True)
df_shark_tank_1['Guest Name'].replace('Nirv Tolia', 'Nirav Tolia', inplace=True)
# Group by guest name and count the # of successful deals
guest_success = df_shark_tank_1.groupby('Guest Name')['Success'].count()
guest_view = df_shark_tank_1.groupby('Guest Name')['US Viewership'].mean() # group by guest name and average viewership
# Create a summary data frame
guest_summary = pd.DataFrame({
'Successful Deals': guest_success,
'Average Viewership': guest_view
})
guest_summary
| Successful Deals | Average Viewership | |
|---|---|---|
| Guest Name | ||
| Alex Rodriguez | 8 | 3.978750 |
| Alli Webb | 2 | 4.130000 |
| Anne Wojcicki | 1 | 3.300000 |
| Ashton Kutcher | 2 | 5.855000 |
| Bethenny Frankel | 2 | 4.030000 |
| Blake Mycoskie | 1 | 4.030000 |
| Charles Barkley | 3 | 3.613333 |
| Chris Sacca | 8 | 5.763750 |
| Daniel Lubetzky | 15 | 3.991333 |
| Emma Grede | 7 | 3.570000 |
| Gwyneth Paltrow | 2 | 3.885000 |
| Jamie Siminoff | 2 | 3.355000 |
| Jeff Foxworthy | 1 | 4.580000 |
| John Paul DeJoria | 1 | 7.310000 |
| Katrina Lake | 1 | 4.340000 |
| Kendra Scott | 5 | 4.186000 |
| Kevin Harrington | 5 | 4.992000 |
| Kevin Hart | 3 | 4.256667 |
| Maria Sharapova | 1 | 4.140000 |
| Matt Higgins | 4 | 3.622500 |
| Nick Woodman | 2 | 7.475000 |
| Nirav Tolia | 3 | 3.726667 |
| Peter Jones | 4 | 3.662500 |
| Richard Branson | 3 | 4.826667 |
| Rohan Oza | 8 | 4.251250 |
| Sara Blakely | 4 | 3.622500 |
| Steve Tisch | 1 | 7.490000 |
| Tony Xu | 3 | 4.040000 |
| Troy Carter | 2 | 5.840000 |
Table 10.5.1: Tabular representation of the guest names, and their presence in the successful business deals.
guest_summary = guest_summary.sort_values(by='Successful Deals', ascending=False) # sort successful deals in descending order
# Plotting
fig, ax1 = plt.subplots(figsize=(12, 8))
# Plot axis 1: bar chart for succesful deals/appearances
color = 'tab:purple'
ax1.set_xlabel('Guest Name')
ax1.set_ylabel('Successful Deals', color=color)
ax1.bar(guest_summary.index, guest_summary['Successful Deals'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
# Adjust chart labels accordingly
ax1.set_xticks(guest_summary.index)
ax1.set_xticklabels(guest_summary.index, rotation=80, ha='right')
# Create a plot for axis 2: Line graph for average viewership
ax2 = ax1.twinx()
color = 'tab:orange'
ax2.set_ylabel('Average Viewership (in millions)', color=color)
ax2.plot(guest_summary.index, guest_summary['Average Viewership'], color=color)
ax2.tick_params(axis='y', labelcolor=color)
# Display plot
fig.tight_layout()
plt.title('Impact of Guests on Successful Deals and Viewership')
plt.show()
Figure 10.5: Bar-chart and line-graph based distribution for the Impact of Guests on the successful business deals, and the overall viewership.
It is observed that every deal was successful when a guest appeared on the show. As this a perfect correlation, there is not much that can be deduced from guest impact on successful deals.
Daniel Lubetzky has the highest number of appearances. For years, Daniel Lubetzky enjoyed watching ABC's Shark Tank with his family, using the opportunity to teach his kids lessons about entrepreneurship. Daniel has always been a huge fan of the show, hence his repeated appearances.
The combined average viewership of Nick Woodman, John Paul DeJoria, and Steve Tisch being 7.43 million views suggests that, on average, their guest appearances have had a notable impact on the show's viewership.
This could imply that these particular guests attract a substantial audience or contribute to the overall appeal of the show when they appear.
Certain guests wield significant influence on show viewership. Collaborations between industry guests and sharks positively impact deal success and attract a larger viewership, providing entrepreneurs with strategic insights for maximizing their chances of success. While there is not much impact on deal success, given their backgrounds, it is likely that the high combined average viewership for these three guests is influenced by their individual successes, brand recognition, and the unique perspectives they bring to the entrepreneurial discussions on Shark Tank.
The value of timing for pitch success is one of the study's notable findings; December turns out to be an especially good month. Seasonal influences and larger economic cycles are connected to this timing trend. The investment environment is changing as well, reflecting the current wave of conscious capitalism with a noticeable shift towards pitches that are more technology-focused and place a greater emphasis on sustainability. This change reflects a wider upheaval in the entrepreneurial environment in addition to shifting investor objectives.
Sharks' co-investment habits reveal their preferences for joint ventures and investments, providing entrepreneurs with insightful approaches to tailor their proposals. Securing favorable equity acquisitions also highlights the importance of well-prepared valuations and excellent negotiation abilities.
Furthermore, there is a noticeable impact of certain guests and sharks on transaction success as well as viewing. Creating pitches specifically for these powerful people can significantly improve an entrepreneur's prospects of success, as well as influencing viewer engagement with the show.
This study sheds light on the essential elements of Shark Tank success, highlighting the necessity for entrepreneurs to choose their industry and time carefully. It also explores how the program has responded to the changing media environment, namely how it has adjusted to streaming on demand and the noteworthy influence of guest stars. This shift in viewership trends raises the possibility that the program must change its format to better suit contemporary tastes in entertainment. The research highlights the significance of agility in adjusting to customer tastes and market changes, which is useful not only for Shark Tank producers but also for prospective contestants. This research fills the knowledge vacuum between entertainment and real-world business acumen, making it an invaluable tool for comprehending the dynamics of success in the rapidly evolving field of entrepreneurship on TV. It offers thorough insights into the dynamic field of entrepreneurial television, providing crucial direction for the show's producers as well as prospective viewers.
Under the guidance of : Prof. John Bono
Submission Date : 12/06/2023